MobileRead Forums - View Single Post - k2pdfopt: optimizes PDFs for viewing on e-readers

willus · 07-20-2016, 11:00 AM

Quote:

Originally Posted by hmijail

I am trying to use k2pdfopt just to make searchable a scanned PDF. According to the help pages, I should be able to basically use -mode copy -ocr; but it's not working: the resulting PDF contains no OCR'd text.

The best I have managed is to use -as -ac -ocr -p2 , which at least gets *some* of the text in one of the pages, but the result is a pretty scrambled PDF. The text dump itself is flowed to short lines.
If I add the -mode copy at the beginning, no text comes out.

...

Maybe the problem is with the scan itself...

Welcome to MR. Yes, it does sound as if the issue is with your scan if you have to use -as and -ac (-p2 is not a correct option unless you have a space between the 'p' and the '2'). You do have Tesseract installed correctly, I take it? Can you PM me a link to your source PDF and I'll have a look?

I do have an OCR help page, though it doesn't have a lot of varying source formats--maybe I'll start an examples page with something mimicking your source file as the first example.