View Single Post
Old 07-20-2016, 11:00 AM   #1282
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,299
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by hmijail View Post
I am trying to use k2pdfopt just to make searchable a scanned PDF. According to the help pages, I should be able to basically use -mode copy -ocr; but it's not working: the resulting PDF contains no OCR'd text.

The best I have managed is to use -as -ac -ocr -p2 , which at least gets *some* of the text in one of the pages, but the result is a pretty scrambled PDF. The text dump itself is flowed to short lines.
If I add the -mode copy at the beginning, no text comes out.

...

Maybe the problem is with the scan itself...
Welcome to MR. Yes, it does sound as if the issue is with your scan if you have to use -as and -ac (-p2 is not a correct option unless you have a space between the 'p' and the '2'). You do have Tesseract installed correctly, I take it? Can you PM me a link to your source PDF and I'll have a look?

I do have an OCR help page, though it doesn't have a lot of varying source formats--maybe I'll start an examples page with something mimicking your source file as the first example.
willus is offline   Reply With Quote