View Single Post
Old 08-10-2015, 09:15 AM   #5
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,303
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by anonymust View Post
Is it possible with any of these softwares to have an OCR done on the files?

Currently the PDF is in high resolution, and the text is pretty crisp and clear as an image.

But I fear if I shrink or split the PDF I will not be able to annotate/highlight certain text within the document (because its im image not text if you know what I mean)

To make things more difficult, some of the text are in columns :/
The software Markom mentioned, k2pdfopt (open source, available for Mac), uses the Tesseract OCR engine if you turn on OCR. That does a pretty good job if the text is well defined, but it is slow--you may have to run the conversion overnight. It should be okay with multiple columns, depending on the complexity of the layout. You can try things out on a few pages at a time. For an example of running the text menu version (like on the Mac) and turning on OCR, watch the 6-minute video on this help page.

I have read on these forums that the best OCR is done with ABBY FineReader, which is a commercial program.

Last edited by willus; 08-10-2015 at 09:18 AM.
willus is offline   Reply With Quote