View Single Post
Old 06-07-2011, 06:28 AM   #11
Iznogood
Guru
Iznogood ought to be getting tired of karma fortunes by now.Iznogood ought to be getting tired of karma fortunes by now.Iznogood ought to be getting tired of karma fortunes by now.Iznogood ought to be getting tired of karma fortunes by now.Iznogood ought to be getting tired of karma fortunes by now.Iznogood ought to be getting tired of karma fortunes by now.Iznogood ought to be getting tired of karma fortunes by now.Iznogood ought to be getting tired of karma fortunes by now.Iznogood ought to be getting tired of karma fortunes by now.Iznogood ought to be getting tired of karma fortunes by now.Iznogood ought to be getting tired of karma fortunes by now.
 
Iznogood's Avatar
 
Posts: 932
Karma: 15752887
Join Date: Mar 2011
Location: Norway
Device: Ipad, kindle paperwhite
Thanks for all tips and advice. I have done some informal testing with both ABBYY Finereader 10 (evaluation program) and Omnipage Pro 17 (purchased, refundable if not satisfied).

Setup:
Scanned one chapter in three different books, on norwegian, one english and one english with a lot of ñ, á, ó, italics and so on to test detection of these. I do not have the heart to cut the books free from the spine , and have only a flatbed scanner, so the scanned images of course are far from perfect. Each image file contains two pages of the book to save time scanning.

I loaded the test chapters in ABBYY and Omnipage, and let each program chew on them until they spat out some text files. ABBYY was instructed to split dual pages, preprocess and deskew image and to detect page orientation. Omnipage did not have these options in load mode. I do not know whether these are default in omnipage, but the images were split correctly, so I believe that both programs performed the same operations.

Text recognition were done in automatic mode in both programs. Since a book consists of several hudred scans, I do not want to draw zones and do image processing on single images. I just pressed the "perform OCR"-button.

None of the programs were allowed training. Since I had only one chapter of each book, training would not have resulted in anything.

Results:
As have been pointed out, both these programs are very accurate, but I found that ABBYY was the most accurate in this case. It did not detect á, é and ò, but it detected emphasized text pretty well (the same yields for Omnipage btw), but ABBYY had less errors overall (I did not count errors, but there were significantly fewer errors in ABBYY). ABBYY also came out best on detecting possible errors.

When it came to proofreading, I find ABBYYs layout more appealing and Omnipages (but I guess that is a matter of preference, and not part of OCR testing. I also liked ABBYYs feature of automatic scanning (i.e. I tell the program to scan a page, wait for X seconds on completion while I change page in the book and places it on the scanner, and the program will scan the next page without me telling it to), so I will be asking for a refund for Omnipage and buying a license of ABBYY instead.

Note to report:
Since the trial version of ABBYY is limited, not all the feature these programs have are tested, and aldo I was not able to check how well ABBYY performed on page breaks. In my case it doesn't matter, because I want to edit page breaks manually (I use a script to insert <span class="newpage" id="pageXXX"/> where XXX is the page number on all page breaks)
Iznogood is offline   Reply With Quote