MobileRead Forums - View Single Post

jayman · 01-27-2010, 01:36 PM

Quote:

Originally Posted by =X=

There is ABBYY Finereader Express that goes for about $50.

Also check out this page ABBY OCR Engine

There is also Google's open source "tesseract" which is an open source project and is free of cost. There are a few GUI built to interface with tesseract to make it easier to use but their not as sophisticated as the commercial products. The one I use is "Softi FreeOCR"

My experience with the latter is that it mostly works. If the starting source is a good scan with clean fonts you'll get a good OCR reading, if it is not so good there will be a lot of errors in the final product. I've had results that gave me 100% accuracy to 90% per page.

With ABBYY FineReader the results are always very good. And the GUI makes it very easy for you to correct the final result. Scanning the same book ABBYY gave me 99%-100% accuracy.

But if cash is the issue you will be happy with tesseract. I have some command line scripts that I wrote when I was using tesseract that will convert a PDF/PNG/JPG/GIF file to a text file. They are written in Perl, if you decided to go with tesseract let me know and I can give you the scripts.

=X=

thanks. I got abbyy finereader and have scanned a book. I've used calibre to convert it into an epub but the chapter marks aren't clickable, etc...Do I have to have a huge computer knowledgebase to "fix" my book so that things are clickable and I can fix the margins, etc...?