View Single Post
Old 05-10-2025, 09:07 PM   #17
ElMiko
Evangelist
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 471
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Sorry, KevinH, not sure what you mean. Why two variants? The patterns i'm referring to are not nearly so limited in their variance.

For example, a common OCR error is to insert a capitalized letter in the middle of a word. This error does not have 1, 2 or 3 consistent replacement values. Not only are there multiple Capital Letter variances, but there are multiple Replacement variances for any given error. For example, a capital "I" might be actually represent, a "t", or an "i", or an "l", or an "h" (when adjacent to a lower case "i") or... or... or... And for a capital "T" well the candidate replacement values might be.... and so on and so forth.

Running a separate search for each conceivable variant is simply unsustainable. Sadly there are a still some things that can only be partially automated. Human eyeballs are still a necessary part of the process.
ElMiko is offline   Reply With Quote