View Single Post
Old 08-12-2009, 11:35 AM   #72
corroonb
Addict
corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.
 
corroonb's Avatar
 
Posts: 317
Karma: 1232685
Join Date: Nov 2008
Location: Ireland
Device: Kindle Voyage, Kobo Aura, Nexus 9
A trick I've found with OCR errors is to identify the consistent errors and look for other words that might not be picked up with a spell check. Obviously this only work well if the error occurs all the time as you would expect of an automated process.

For example I had an OCR text that had replaced every cl at the start of a word with d. It was easy to find the words like dothes and doset with a spell checker and do a global replace but I had to search for every word that makes sense with a cl and a d in front of it using a dictionary. And you can't use a global replace with dean/clean or dosed/closed as the context has to be checked.

Apologies if this is obvious.
corroonb is offline   Reply With Quote