Quote:
Originally Posted by polarisrising
This is getting very close. The images look great in the pdf and they highlight correctly. But, when I go to import the pdf to Calibre, there are two issues: If I import it without changing the settings, it imports the pdf with the images embedded, and no OCR text. If I select the option to not import the images from the pdf, then the pages are all blank.
|
You may wish to try the -ocrout option which just dumps all of the OCR text to an ASCII (UTF-8) file:
-ocrout outfile.txt
You'll probably have to go through and clean it up a bit, but the OCR layer appears to be very good, so hopefully your editing will be minimal. I've attached the output from pages 20-25.