Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-04-2009, 08:32 AM   #1
PieOPah
Addict
PieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-books
 
PieOPah's Avatar
 
Posts: 397
Karma: 914
Join Date: Oct 2008
Location: UK
Device: Sony PRS-505
ABBYY FineReader - Proof reading tips?

I have recently started to rip some of my books.

Using ABBYY is fantastic and it recognises most of the text it scans. My main gripe though is that for some words there is no dictionary suggestion so I am unable to do a 'replace all'. EG litdle instead of little.

Also, the number of unrecognised characters makes proof reading a slow process.

Is there anything I can do to speed things up at all?
I would love to be able to do a search/replace on common scanning errors. I would love to have the software recognise a few more of the characters without having to highlight them (it always seems to get them right - putting a comma where it should, just highlights it as unrecognised!)

Are there any tips that people can give to help me speed up my proof reading stage?

Much appreciated
PieOPah is offline   Reply With Quote
Old 02-04-2009, 12:02 PM   #2
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,743
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
I also use ABBYY Finereader. I've found it quicker to save the OCRed file to MS Word (or the free OpenOffice equivalent) to do my editing. In Word you can use the spell checker (but it doesn't fine everything, e.g. ABBYY often sees I'll as 111 and the spell checker is OK with 111.)

You will find many common errors depending on the printed font, e.g. rn may be seen as m and vice versa). If/when you keep running into the same error, you can then do a global replace. Some of my books are westerns and ABBYY will not recognize the tilde in seor. This can be globally changed, etc.

Another thing, ABBYY doesn't like the standard way of showing Em dashes (nor do I). The standard is "word—word". Both MS Word and ABBYY see this as a spelling error. I wrote a macro to change it to "word — word" which does not show as a spelling error and I prefer it this way. Sometimes ABBYY will see a space between words as a double space. These can be easily changed with a "replace all".

Last edited by slayda; 02-04-2009 at 12:21 PM.
slayda is offline   Reply With Quote
 
Advertisement
Old 02-04-2009, 12:16 PM   #3
RWood
Technogeezer
RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.
 
RWood's Avatar
 
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
Most of the common errors change for each text converted. ABBYY seems to do better with a serif character than a sans serif font.

Depending upon the printing the errors will change. If it is a light print, I find more of the "e" characters read as a "c".

As slayda mentioned, there is the confusion of "I", "l", and "1". Another common error is "O" or "o" read as "0" (zero) and the other way around. Also, quotes (single, double, and one inside another) can be problem -- sometimes a double will be seen as two singles and compound quotes will be split the wrong way.

I take the text into Word as soon as I can. I find it does a far better job than running the editor in ABBYY.
RWood is offline   Reply With Quote
Old 02-04-2009, 12:28 PM   #4
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,140
Karma: 24387938
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Cli; PRS-505; EZR Pocket Pro, PRS-600, Kobo Mini
I use ABBYY 7, so it's possible some of the menu options have shifted by 8 or 9, but I expect they still all exist.

Tools-->Options: Check Spelling tab
Unclick "Ignore words with digits and other nonalphabetical characters." Do this before running OCR.

To add all grammatical forms of words to the dictionary, unclick "Skip prompting for word forms." (Note: this can add much time to the proofreading, but it helps the dictionary later. Especially useful if you do lots of a single type of documents that have lots of their own vocabulary--legal docs, medical, fashion magazines, whatever.)

Error display level: Set to "thorough" before running the OCR.

While editing:
Control-H gets "find and replace." You can use this to fix common OCR errors, but be careful that you want to replace *all* uses of the word or word-section. (Good to find all of those cases of "die" instead of "the," but you don't want to use Replace All on those.)

When a suspect appears, "accept" is the default selection; you can use return instead of a mouse-click to go on to the next one. (You may know this; I've run into people who didn't, and tried to click on the mouse for every single suspect. Good recipe for carpal-tunnel syndrome.)

When the text window is open for corrections, you can move around in it with mouse or arrows; I'll often correct every error I can see in the window before continuing with the automated error finder; this lets me spot some errors that the program doesn't recognize. (Like using hyphens instead of superscript 1's. Or spelling burn as bum.)
Elfwreck is offline   Reply With Quote
Old 02-04-2009, 12:49 PM   #5
PieOPah
Addict
PieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-books
 
PieOPah's Avatar
 
Posts: 397
Karma: 914
Join Date: Oct 2008
Location: UK
Device: Sony PRS-505
Thank you all for the input.

I like using the ABBYY editor as it allows me to see the scan matching up with what I am editing. As a result, I can quickly see what a word is meant to be or what punctuation should be like etc.

I'll most definately be using Ctrl+H for my find/replace - this should save me a fair amount of time with some of the more common errors

I am also starting to use the pattern trainer so hopefully this will reduce the number of errors on a page.
PieOPah is offline   Reply With Quote
Old 02-13-2009, 08:59 AM   #6
Maggy
Junior Member
Maggy began at the beginning.
 
Posts: 7
Karma: 39
Join Date: Oct 2008
Device: iliad 1
Although I agree that Abbyy does a very fine job in general, I believe there is still lots of room for improvements.

Unfortunately Abbyy has no forum, IMHO it should have. I hope they read this.

First of all Search and Replace work one way, selectable up or down. No continue from top. So what I always do is jump to first page, search in all pages, jump back to top, search next word. Before I start doing searches I first walk through the document and make search/replace notes in a text editor.

When you replace words that appear both capitalised and lower case, first replace exact capitalised, than again for only lower case. For example if your document contains the French word muse and Muse but OCR skipped some accents first search Musee, then musee.

Often I see groups of characters in different words that I want to replace. For example yesterday I had a document in which n several places official, difficult ands so on had ffd instead of ffic. Searching just for that group is much faster than for the different words.

Too bad it doesn't allow wildcards nor regular expressions.

Never trust the layout of the preview, once you've turned it into PDF it looks much better. Never try to edit the layout of the preview in Abbyy either, most likely it will ruin the layout of your final PDF. I'm still searching for the easiest way to correct layout errors in final files created by Abbyy. Currently the best way I can find is:
-see if the PDF is good enough
-if not export as Word document and see if it's easy to fix it in Word
-if not export as text, create new Word document, new style sheet

Abbyy can make a weird type of error while scanning 2 column index pages. It may think that it are 2 pages of a curved bok and starts trying to warp them. Actually bending long lines so much they become unreadable. At first I avoided this by creating 2 scans per page covering 1 column per scan. But there is an easier solution. Simply import the same scan twice, first select only the left column, than the right one for OCR. Merge to a single page using Word.

When you start Abbyy proof reading never ever allow it to add extra spaces after dots, commas etc. On documents that give bad OCR result first proof read almost blindfolded, adding ALL found words. Then open dictionary editor, export and read in notepad using Courier font. You can now much easier see the difference between m and rn and so on. Of course you'll have to remove all misspelled words from dictionary before you do your second proof reading pass. Unfortunately one by one, Abbyy should add check boxes in dictionary editor.

Abbyy does perform a first round of proof reading while performing OCR. In general this is a very fine feature. But it can be a pain in the XXX. And it can not be turned off. By default it turns a french Duc into Due and so on. And it never marks these "corrected" words as suspect and you'll never find them with proof reading. So if you want to hand out copies of your PDF, please read it first.

I have a lot more comments on Finereader, Abbyy, if you're reading this feel free to contact me.
Maggy is offline   Reply With Quote
Old 02-13-2009, 10:16 AM   #7
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,743
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
I have to agree that "foreign" words (i.e. words not in the main language of the book) often leave off the proper letter markings. E.g. I westerns I scan, ABBYY will see seor as senor. I wish there were a way to designate a secondary language, i.e. in the above case, English would be primary ans Spanish would be secondary.
slayda is offline   Reply With Quote
Old 02-13-2009, 01:32 PM   #8
Timoleon
Time Enough at Last
Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.
 
Timoleon's Avatar
 
Posts: 385
Karma: 1151316
Join Date: Feb 2008
Location: New England
Device: iPad 3, iPhone 5, Kindle 3, Fire, Sony PRS-350
When I scan a page to be ocr'd, I always run the image through a graphics editor afterwards for sharpening and contrast before I give it the final pass through ABBYY FineReader. I find that this cuts down on the read errors immensely!
Timoleon is offline   Reply With Quote
Old 02-13-2009, 02:19 PM   #9
DDHarriman
Guru
DDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheese
 
Posts: 854
Karma: 1200
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
Hi Maggy

Outstanding good information you have shared here.
Can you be so kind and post more of your comments? They are really invaluable!

Best regards,
DDHarriman is offline   Reply With Quote
Old 02-14-2009, 05:11 AM   #10
Andybaby
Wizard
Andybaby plays well with othersAndybaby plays well with othersAndybaby plays well with othersAndybaby plays well with othersAndybaby plays well with othersAndybaby plays well with othersAndybaby plays well with othersAndybaby plays well with othersAndybaby plays well with othersAndybaby plays well with othersAndybaby plays well with others
 
Andybaby's Avatar
 
Posts: 1,279
Karma: 2683
Join Date: Nov 2008
Location: New York
Device: PRS-700
I spent 8 hours proofing a book in abbyy today.

while spell checking, if you come across the same suggested error alot, but its right, if you do a find/Replace for she same character. it will stop poping up (Make sure you use match Case, or else you will change the wrong things all over the document)

if you get alot of errors it doesnt hurt to use pattern recognition. which i had to do. it made it alot more accurate.
Andybaby is offline   Reply With Quote
Old 03-30-2009, 09:14 PM   #11
michael_v2
Junior Member
michael_v2 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2009
Device: Sony PRS-505
Hi,
I was wondering how I can include enumerated list in BD? The HTML tags (OL, LI) works in BD and able to see them. But when I make Sony Reader file they are gone, just plain paragraph.
michael_v2 is offline   Reply With Quote
Old 04-09-2009, 06:49 AM   #12
Gall
Junior Member
Gall began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Apr 2009
Device: none
ABBYY's forum

Quote:
Originally Posted by Maggy View Post
Unfortunately Abbyy has no forum, IMHO it should have. I hope they read this.
ABBYY has forum in russian, but I think you may post there in English too
http://www.abbyy.ru/finereader/forum/actualforum.aspx
You may use http://translate.google.com to understand registration fields and to choose appropriate subforum.

Good luck

PS. Sorry for my barbarian English
Gall is offline   Reply With Quote
Old 04-09-2009, 09:51 AM   #13
murraypaul
Interested Bystander
murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.murraypaul ought to be getting tired of karma fortunes by now.
 
Posts: 3,224
Karma: 10210627
Join Date: Jun 2008
Device: Sony PRS505, Nook Color(CM7), iPad3
Quote:
Originally Posted by slayda View Post
I have to agree that "foreign" words (i.e. words not in the main language of the book) often leave off the proper letter markings. E.g. I westerns I scan, ABBYY will see seor as senor. I wish there were a way to designate a secondary language, i.e. in the above case, English would be primary ans Spanish would be secondary.
I don't have a version to hand, but you can select multiple languages for OCR. So in this instance you could OCR it with both English and Spanish selected. It will then recognise accents suitable for all the languages you select, and use dictionaries for all of them, IIRC.
murraypaul is offline   Reply With Quote
Old 04-09-2009, 10:02 AM   #14
BenG
Wizard
BenG ought to be getting tired of karma fortunes by now.BenG ought to be getting tired of karma fortunes by now.BenG ought to be getting tired of karma fortunes by now.BenG ought to be getting tired of karma fortunes by now.BenG ought to be getting tired of karma fortunes by now.BenG ought to be getting tired of karma fortunes by now.BenG ought to be getting tired of karma fortunes by now.BenG ought to be getting tired of karma fortunes by now.BenG ought to be getting tired of karma fortunes by now.BenG ought to be getting tired of karma fortunes by now.BenG ought to be getting tired of karma fortunes by now.
 
BenG's Avatar
 
Posts: 3,798
Karma: 59154248
Join Date: Jun 2007
Location: Arkham, MA
Device: Paperwhite
Ypu can also add accented characters to a language through the Language editor. This works for Finereader 7 Professional.

1) Go to the Tools Menu and click Language Editor. You can also use the keyboard shortcut Ctrl+Shift+L to open up the Language Editor window while in FR.

2) In the Language Editor Window click on the "New" button.

3) In the "New Language or Group" Tab select the first option (Create a new language based on) and make sure English is selected. What this does is create a new dictionary file based on your previous English Dictionary.

4)In the "Simple Language Properties" Tab:

1. Fill in a 'Language Name' of your choice. - Preferably make a name up that will remind you as to why you made it.(I call it "English Plus")
2. Leave the "Source Language" as is (or change it to English-British)
3.Now at the "Alphabet" section; highlight the entire LINE with the mouse (or use the Keyboard shortcut Shift+End) and then tap the Delete Key on your Keyboard (leaving nothing in this area) replacing the empty area with this line of characters (remember to keep it all on one line)

™!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

5)Keep the "Built-in dictionary" button marked, click okay, exit out of the program windows and you are all set.

6)Now all that remains is for you to use the new Language you have created from the drop down list on the Finereader toolbar under "User Languages. And that is all there is to recognizing Foreign Characters.

Note: when you add a word to the dictionary that contains non-English letters a warning box will pop up, just click through it. It is only warning you about the foreign characters. Then add the word to your dictionary as you normally would.
BenG is offline   Reply With Quote
Old 04-09-2009, 10:09 AM   #15
PieOPah
Addict
PieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-booksPieOPah has learned how to read e-books
 
PieOPah's Avatar
 
Posts: 397
Karma: 914
Join Date: Oct 2008
Location: UK
Device: Sony PRS-505
That seems like a nice stright forward way of doing things

Will look into that next time I boot up the software. Karma to you
PieOPah is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ABBYY Finereader and text formating Student1 Workshop 6 12-15-2011 07:37 PM
ABBYY Finereader - Possible to command line/auto convert? tessel Workshop 3 04-06-2011 12:08 PM
Abbyy FineReader Dictionaries Mebyon Workshop 2 02-10-2010 03:57 PM
ABBYY FineReader cannot see images chinesealbumart Workshop 8 05-16-2009 12:03 AM
Ended wanted: coupon code for Abbyy finereader moz Flea Market 1 03-12-2008 03:10 AM


All times are GMT -4. The time now is 08:15 AM.


MobileRead.com is a privately owned, operated and funded community.