Quote:
Originally Posted by corroonb
A trick I've found with OCR errors is to identify the consistent errors and look for other words that might not be picked up with a spell check. Obviously this only work well if the error occurs all the time as you would expect of an automated process.
For example I had an OCR text that had replaced every cl at the start of a word with d. It was easy to find the words like dothes and doset with a spell checker and do a global replace but I had to search for every word that makes sense with a cl and a d in front of it using a dictionary. And you can't use a global replace with dean/clean or dosed/closed as the context has to be checked.
Apologies if this is obvious.
|
It's a good idea. I've sort of been doing this already, but it's the same as being completely conscious about it, and I've never thought to use a dictionary to help.