02-18-2023, 11:38 AM | #16 |
Wizard
Posts: 3,007
Karma: 18401861
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
I am running 64-bit linux, but because of Illegal Instruction errors, I have been running the 32-bit version of k2pdfopt-2.53. Unfortunately, the first command above fails after a dozen pages because k2pdfopt cannot allocate enough memory (it fails while trying to allocate a 273 MB buffer). The resident memory usage must have hit the limit for 32-bit programs. I have 24 GB of RAM in the system, so this is a frustrating roadblock.
The memory usage for that first command is way higher than for the command given in post #6. Is it "-ocrd p" that causes the usage to skyrocket? |
02-18-2023, 06:08 PM | #17 |
Fuzzball, the purple cat
Posts: 1,282
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Sent you a PM. Please check. Would like to get your doc so I can validate the memory usage.
|
02-18-2023, 07:19 PM | #18 | |
Fuzzball, the purple cat
Posts: 1,282
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
On a virtual Fedora 37 machine with 16 GB RAM, with k2pdfopt v2.54, I get the following results from this command: k2pdfopt -nt <XX> -mode copy -dpi 600 -ocr t -ocrd p src.pdf 32-bit, <XX> = 8 failed on page 13 trying to allocate a 1-GB bitmap 32-bit, <XX> = 4 same result as above 32-bit, <XX> = 2 completed successfully 64-bit, <XX> = 8 completed successfully (consumed up to 5.5 GB during the run) With k2pdfopt v2.53: 64-bit fails in Fedora 37 because it was compiled on an earlier Linux kernel. 32-bit, <XX> = 8 and 4 fails on page 13 trying to allocate a 270-MB image 32-bit, <XX> = 2 fails on page 14 32-bit, <XX> = 1 fails on page 17 There is a known memory leak issue in v2.53. See the fix in v2.54. Last edited by willus; 02-18-2023 at 08:13 PM. |
|
02-18-2023, 09:02 PM | #19 |
Fuzzball, the purple cat
Posts: 1,282
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
One note--you don't really need to use 600 dpi for the first command even if your source document is at 600 dpi. The purpose of the first command is solely to get an accurate OCR conversion, and with a typical font size of 10-12 points, Tesseract seems to be the most accurate right around 300 dpi. As the dpi is increased it actually becomes slightly less accurate.
Last edited by willus; 02-19-2023 at 03:03 PM. Reason: Updated Tesseract accuracy link |
02-18-2023, 09:02 PM | #20 |
Wizard
Posts: 3,007
Karma: 18401861
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
Yes, that was the problem. I don't know why I was running 2.53 when I went to the download page less than a week ago, but somehow I ended up with that (most likely my fault). Version 2.54 works well for me. Thanks for figuring out what I did wrong.
|
02-18-2023, 09:04 PM | #21 |
Fuzzball, the purple cat
Posts: 1,282
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
The k2pdfopt download page was temporarily, incorrectly not showing the v2.54 version. It has been fixed.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to turn multiple jpg images into 1 pdf ebook file | DawnDawn88 | Conversion | 12 | 01-19-2024 04:21 AM |
PDF -> JPG -> CBZ -> LRF | leveck | Workshop | 13 | 06-16-2011 11:21 AM |
Entourage Edge and JPG's to PDF files | xander | enTourage Archive | 23 | 04-04-2011 06:53 PM |
DR800 Convert PDF to JPG for faster loading speed? | bokjeid | iRex | 1 | 07-24-2010 09:32 AM |
Doubts about Kobo - jpg converted to pdf, and some smaller issues... | mig_akira | Kobo Reader | 9 | 06-10-2010 06:11 PM |