Quote:
Originally Posted by hmijail
I am trying to use k2pdfopt just to make searchable a scanned PDF. According to the help pages, I should be able to basically use -mode copy -ocr; but it's not working: the resulting PDF contains no OCR'd text.
The best I have managed is to use -as -ac -ocr -p2 , which at least gets *some* of the text in one of the pages, but the result is a pretty scrambled PDF. The text dump itself is flowed to short lines.
If I add the -mode copy at the beginning, no text comes out.
...
Maybe the problem is with the scan itself...
|
Welcome to MR. Yes, it does sound as if the issue is with your scan if you have to use -as and -ac (-p2 is not a correct option unless you have a space between the 'p' and the '2'). You do have Tesseract installed correctly, I take it? Can you PM me a link to your source PDF and I'll have a look?
I do have an
OCR help page, though it doesn't have a lot of varying source formats--maybe I'll start an examples page with something mimicking your source file as the first example.