Quote:
Originally Posted by speakingtohe
I have read a couple of books that have die for the or he for the. Never thought about it being the font.
|
If the font has particularly thin tops to curves (or the scanning specs lightened the text), the OCR program can miss the fact that the "h" is one letter, and instead split it into two. If the "t" is clearly separate, that gets read as "tlie;" if the crossbar is long enough to almost touch the "h," that often gets read as "die"--and then, of course, a spellcheck program will miss it. (And search/replace can't be used; it has to be manually checked.)
Books with those problems are also prone to having "diat," "dian," "diere," "diese" and "diis." The "tl" problem isn't as common for longer "th" words; the internal dictionaries seem to be better at recognizing that "there" is more likely than "tliere."
These are problems that don't need the intense, meticulous check to fix; they just need someone flipping through pages enough to notice *one* example, and then doing quick, focused clean-up on the rest.