Sorry, KevinH, not sure what you mean. Why two variants? The patterns i'm referring to are not nearly so limited in their variance.
For example, a common OCR error is to insert a capitalized letter in the middle of a word. This error does not have 1, 2 or 3 consistent replacement values. Not only are there multiple Capital Letter variances, but there are multiple Replacement variances for any given error. For example, a capital "I" might be actually represent, a "t", or an "i", or an "l", or an "h" (when adjacent to a lower case "i") or... or... or... And for a capital "T" well the candidate replacement values might be.... and so on and so forth.
Running a separate search for each conceivable variant is simply unsustainable. Sadly there are a still some things that can only be partially automated. Human eyeballs are still a necessary part of the process.
|