View Single Post
Old 11-06-2018, 08:28 AM   #1607
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,303
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by polarisrising View Post
This is getting very close. The images look great in the pdf and they highlight correctly. But, when I go to import the pdf to Calibre, there are two issues: If I import it without changing the settings, it imports the pdf with the images embedded, and no OCR text. If I select the option to not import the images from the pdf, then the pages are all blank.
You may wish to try the -ocrout option which just dumps all of the OCR text to an ASCII (UTF-8) file:

-ocrout outfile.txt

You'll probably have to go through and clean it up a bit, but the OCR layer appears to be very good, so hopefully your editing will be minimal. I've attached the output from pages 20-25.
Attached Files
File Type: txt outfile.txt (26.0 KB, 258 views)
willus is offline   Reply With Quote