MobileRead Forums - View Single Post - ABBYY FineReader

Maggy · 02-13-2009, 07:59 AM

Although I agree that Abbyy does a very fine job in general, I believe there is still lots of room for improvements.

Unfortunately Abbyy has no forum, IMHO it should have. I hope they read this.

First of all Search and Replace work one way, selectable up or down. No continue from top. So what I always do is jump to first page, search in all pages, jump back to top, search next word. Before I start doing searches I first walk through the document and make search/replace notes in a text editor.

When you replace words that appear both capitalised and lower case, first replace exact capitalised, than again for only lower case. For example if your document contains the French word musée and Musée but OCR skipped some accents first search Musee, then musee.

Often I see groups of characters in different words that I want to replace. For example yesterday I had a document in which n several places official, difficult ands so on had ffd instead of ffic. Searching just for that group is much faster than for the different words.

Too bad it doesn't allow wildcards nor regular expressions.

Never trust the layout of the preview, once you've turned it into PDF it looks much better. Never try to edit the layout of the preview in Abbyy either, most likely it will ruin the layout of your final PDF. I'm still searching for the easiest way to correct layout errors in final files created by Abbyy. Currently the best way I can find is:
-see if the PDF is good enough
-if not export as Word document and see if it's easy to fix it in Word
-if not export as text, create new Word document, new style sheet

Abbyy can make a weird type of error while scanning 2 column index pages. It may think that it are 2 pages of a curved bok and starts trying to warp them. Actually bending long lines so much they become unreadable. At first I avoided this by creating 2 scans per page covering 1 column per scan. But there is an easier solution. Simply import the same scan twice, first select only the left column, than the right one for OCR. Merge to a single page using Word.

When you start Abbyy proof reading never ever allow it to add extra spaces after dots, commas etc. On documents that give bad OCR result first proof read almost blindfolded, adding ALL found words. Then open dictionary editor, export and read in notepad using Courier font. You can now much easier see the difference between m and rn and so on. Of course you'll have to remove all misspelled words from dictionary before you do your second proof reading pass. Unfortunately one by one, Abbyy should add check boxes in dictionary editor.

Abbyy does perform a first round of proof reading while performing OCR. In general this is a very fine feature. But it can be a pain in the XXX. And it can not be turned off. By default it turns a french Duc into Due and so on. And it never marks these "corrected" words as suspect and you'll never find them with proof reading. So if you want to hand out copies of your PDF, please read it first.

I have a lot more comments on Finereader, Abbyy, if you're reading this feel free to contact me.

02-13-2009, 07:59 AM	#6
Maggy Junior Member Posts: 7 Karma: 39 Join Date: Oct 2008 Device: iliad 1	Although I agree that Abbyy does a very fine job in general, I believe there is still lots of room for improvements. Unfortunately Abbyy has no forum, IMHO it should have. I hope they read this. First of all Search and Replace work one way, selectable up or down. No continue from top. So what I always do is jump to first page, search in all pages, jump back to top, search next word. Before I start doing searches I first walk through the document and make search/replace notes in a text editor. When you replace words that appear both capitalised and lower case, first replace exact capitalised, than again for only lower case. For example if your document contains the French word musée and Musée but OCR skipped some accents first search Musee, then musee. Often I see groups of characters in different words that I want to replace. For example yesterday I had a document in which n several places official, difficult ands so on had ffd instead of ffic. Searching just for that group is much faster than for the different words. Too bad it doesn't allow wildcards nor regular expressions. Never trust the layout of the preview, once you've turned it into PDF it looks much better. Never try to edit the layout of the preview in Abbyy either, most likely it will ruin the layout of your final PDF. I'm still searching for the easiest way to correct layout errors in final files created by Abbyy. Currently the best way I can find is: -see if the PDF is good enough -if not export as Word document and see if it's easy to fix it in Word -if not export as text, create new Word document, new style sheet Abbyy can make a weird type of error while scanning 2 column index pages. It may think that it are 2 pages of a curved bok and starts trying to warp them. Actually bending long lines so much they become unreadable. At first I avoided this by creating 2 scans per page covering 1 column per scan. But there is an easier solution. Simply import the same scan twice, first select only the left column, than the right one for OCR. Merge to a single page using Word. When you start Abbyy proof reading never ever allow it to add extra spaces after dots, commas etc. On documents that give bad OCR result first proof read almost blindfolded, adding ALL found words. Then open dictionary editor, export and read in notepad using Courier font. You can now much easier see the difference between m and rn and so on. Of course you'll have to remove all misspelled words from dictionary before you do your second proof reading pass. Unfortunately one by one, Abbyy should add check boxes in dictionary editor. Abbyy does perform a first round of proof reading while performing OCR. In general this is a very fine feature. But it can be a pain in the XXX. And it can not be turned off. By default it turns a french Duc into Due and so on. And it never marks these "corrected" words as suspect and you'll never find them with proof reading. So if you want to hand out copies of your PDF, please read it first. I have a lot more comments on Finereader, Abbyy, if you're reading this feel free to contact me.