MobileRead Forums - View Single Post - k2pdfopt: optimizes PDFs for viewing on e-readers

hmijail · 07-20-2016, 08:31 AM

Hello,

I am trying to use k2pdfopt just to make searchable a scanned PDF. According to the help pages, I should be able to basically use -mode copy -ocr; but it's not working: the resulting PDF contains no OCR'd text.

The best I have managed is to use -as -ac -ocr -p2 , which at least gets *some* of the text in one of the pages, but the result is a pretty scrambled PDF. The text dump itself is flowed to short lines.
If I add the -mode copy at the beginning, no text comes out.

So I would like to ask for some guidance for what to try next. I'm not posting publicly the PDF because it's a scan of personal documentation, but I can send it somewhere if it can help fix something.

Maybe the problem is with the scan itself. So I would suggest: maybe you could post some examples of full OCR scans in your web page together with the command lines that created them? That way one could quickly get an idea of what should be working or not.

I guess that this usage of k2pdfopt is almost off-topic for mobileread.com, so if I should ask somewhere else, please let me know

07-20-2016, 08:31 AM	#1281
hmijail Junior Member Posts: 1 Karma: 21970 Join Date: Jul 2016 Device: none	Hello, I am trying to use k2pdfopt just to make searchable a scanned PDF. According to the help pages, I should be able to basically use -mode copy -ocr; but it's not working: the resulting PDF contains no OCR'd text. The best I have managed is to use -as -ac -ocr -p2 , which at least gets some of the text in one of the pages, but the result is a pretty scrambled PDF. The text dump itself is flowed to short lines. If I add the -mode copy at the beginning, no text comes out. So I would like to ask for some guidance for what to try next. I'm not posting publicly the PDF because it's a scan of personal documentation, but I can send it somewhere if it can help fix something. Maybe the problem is with the scan itself. So I would suggest: maybe you could post some examples of full OCR scans in your web page together with the command lines that created them? That way one could quickly get an idea of what should be working or not. I guess that this usage of k2pdfopt is almost off-topic for mobileread.com, so if I should ask somewhere else, please let me know