MobileRead Forums - View Single Post

willus · 08-10-2015, 09:15 AM

Quote:

Originally Posted by anonymust

Is it possible with any of these softwares to have an OCR done on the files?

Currently the PDF is in high resolution, and the text is pretty crisp and clear as an image.

But I fear if I shrink or split the PDF I will not be able to annotate/highlight certain text within the document (because its im image not text if you know what I mean)

To make things more difficult, some of the text are in columns :/

The software Markom mentioned, k2pdfopt (open source, available for Mac), uses the Tesseract OCR engine if you turn on OCR. That does a pretty good job if the text is well defined, but it is slow--you may have to run the conversion overnight. It should be okay with multiple columns, depending on the complexity of the layout. You can try things out on a few pages at a time. For an example of running the text menu version (like on the Mac) and turning on OCR, watch the 6-minute video on this help page.

I have read on these forums that the best OCR is done with ABBY FineReader, which is a commercial program.