View Single Post
Old 08-13-2013, 06:17 PM   #66
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
Quote:
Originally Posted by speakingtohe View Post
I have read a couple of books that have die for the or he for the. Never thought about it being the font.
If the font has particularly thin tops to curves (or the scanning specs lightened the text), the OCR program can miss the fact that the "h" is one letter, and instead split it into two. If the "t" is clearly separate, that gets read as "tlie;" if the crossbar is long enough to almost touch the "h," that often gets read as "die"--and then, of course, a spellcheck program will miss it. (And search/replace can't be used; it has to be manually checked.)

Books with those problems are also prone to having "diat," "dian," "diere," "diese" and "diis." The "tl" problem isn't as common for longer "th" words; the internal dictionaries seem to be better at recognizing that "there" is more likely than "tliere."

These are problems that don't need the intense, meticulous check to fix; they just need someone flipping through pages enough to notice *one* example, and then doing quick, focused clean-up on the rest.
Elfwreck is offline   Reply With Quote