MobileRead Forums - View Single Post - Can you OCR the images inside of .pdf files?

shevirsy · 09-13-2014, 04:48 AM

Quote:

Originally Posted by Tex2002ans

I already linked to a Wikipedia article showing off a comparison of many different OCR programs in Post #13 right in this topic:

Thanks for the links. I know the wikipedia article. It's depressing. Abbyy Finereader and that's about all. I was hoping for some missed gem.

Quote:

Most likely the only free OCR of note would be Tesseract (and most of the Free OCR programs out there would use (most likely an outdated) version of Tesseract in the backend).

I already explained many of the disadvantages of the free solutions above. Although you are free to read the Tesseract documentation and do much of the training/tweaking needed.

I stay away from these tools made to help front ends. I need a front-end, I am not the front-end. I try OCRfeeder. When it comes to a few pages, it can be better than typing the pages.

Quote:

I personally would just err on the side of the paid OCR programs, ESPECIALLY when dealing with non-English works, or works with lots of accented characters. While the proprietary OCR programs are not zero dollars initially, they would save you A TON of time in all of your post-OCR processing steps (which is where you WILL spend most of your time). The more accurate/clean you can get your input, you will have to spend MUCH less time cleaning, and getting the document into a readable state.

You do have a point.

Quote:

Besides that, you can use GIMP/Inkscape/Imagemagick in order to manipulate the images fine.

I prefer using all free software over proprietary whenever I can, but sadly, OCR is just one area where the free solutions don't hold much of a candle.

Well, Scan Tailor might help a lot more. But Jellby is right and I won't call names the users who just groom their post count. I just say "bye bye" and add them on ignore.