Proofing Tips and Tricks

-- Common OCR Errors
---- Tips
-- Using the Ctrl+H shortcut in FineReader


Common OCR Errors

This file lists some of the more common OCR errors that appear when proofing a newly scanned book. A fair amount of these errors can be corrected within FineReader using the spell checker and later the Ctrl+H command. (More on the Ctrl+H command later on). Please note that this is only a rough list that I use when proofing OCR errors it is by no means complete or foolproof.

Warning: when I made and used my first OCR Error list I did something really stupid, and that was to do an automatic search and replace. Correctly spelled 'incorrect' words are far harder to spot than words which are obviously incorrect.

So take a word of warning, when doing a search and replace be extremely careful. The best way to go is to do a search and replace using the next button rather than being tempted to use the change (or correct) all option.

Common Errors

Error Correction   Error Correction
         
wc we   hi in
lne me   77ze The
ail all   m in
AH All   TJie The
j ,   he be
ot of   ran can
/ ,'   modem modern
la k   hut but
rnay may   rn m
rnent ment   . . . ...
 ? ?    ; ;
 ! !   n II
em ern  

.

" '

.

"'

liim him   w.iiched watched
arid and   1 I
OP OF   1 l
Prom From   mam main
flics flies   withm within
wmged winged   fmger finger

Less Common Errors

Error Correction   Error Correction
         
dc de   :. :
nI M   .: :
unc une   ;, ;
rnm mm   ]- j
mrn mm   )- j
ovennight overnight   1- i
frcedman freedman      

Tips

Each book can produce different OCR errors. This really depends on its font, size, style, background, paper quality etc the usual culprits are italics and smaller or unusual fonts.

From experience it is best to look for the most obvious errors, take note of it (make your own OCR error list if you want) and at the end of your scanning session (before exporting to text / html / rtf etc) go through your batch files correcting the errors in FineReader. I say this because once you have exported the file to further edit it you will lose the ability to compare it to the image and it is far harder to look through the book for the incorrect word than to simply compare it to the original image (in FineReader).

MS Office 2003 combined with FineReader 7 is said to resolve this problem, when I get Office 2003 I will test it out.

Two Good Text Editors

Within almost any good text editor (I prefer using NoteTab Pro and / or Interparse) you can input special commands in either the search and replace box or its own special command line. For example (.^P^P" ' [to] .^P^P"') is a common one I use in NoteTab Pro the ^P represents a break in the line / paragraph)  Interparse however allows for quite complex operations for those who do not mind taking the time to learn its real functionality.

InterParse -Freeware- (for Linux and Windows) can be found here: http://www.interparse.com/

NoteTab Pro -$19.95 USD- can be found here: http://www.notetab.com/


Using the Ctrl+H shortcut in FineReader

1# Keeping the above OCR errors in mind; while in FineReader you can click the above button  (Keyboard Shortcut: Ctrl+H) to bring up the search and replace dialogue. If you do not have the button showing [Click here] to view a quick 'how to add buttons in FineReader."

2# A box like the one above should pop up. As can been seen it is a lot like a standard search and replace box in almost every good text editor. Ticking the first two boxes ("Match whole words only" and "Match Case") helps narrow the search for the incorrect word and the last box ("Look through all batch pages") checks all of your scans in the current batch.

Clicking the "Find Next" button takes you to the first matching error, if it is an error (check also the image window) then click "Replace" if it is not an error click "Find Next" and it will skip that instance and go looking for the next mistake.

# Tip: Click the first page in your Scanning Batch. Sometimes the "Look through all batch pages" only does so from the current page you are on so if you are (or were) currently looking at page 50 it will begin looking from page 50 to the end of the book, missing the first 49 pages.

Note: Avoid the "Replace All" unless you are almost 100% sure that it is a common error, because it can not be undone easily. The above example is fairly common and in the books I do (mainly reference works etc) it is usually safe to click the "Replace All" button, (in this example) however if you are doing a book where there is dialogue between people then it would not be safe since the word "hi" would appear and you would have just changed it to the word "in".

TOC

© 2003 http://ebook.23ae.com/