MobileRead Forums - View Single Post

DebbyS · 05-03-2014, 06:12 PM

For my current project, I did a search for "any digit"o [any digit + oh] so I could see if the "o" should be "0" (zero). The OCR was also italicizing words it shouldn't have, but it was largely extending italicized words in the Huichol and Spanish languages to the next few English words, so in the end I'll search for [blank] italicized to see if I missed any, as well as searching for [blank] [DPCustomMono] and trade that for Times New Roman. Accented "o" (oh) tends to become a "6", too, but that's easy to see. I'm sure if the font in the book had been larger than 10point or so, the accuracy of the OCR would have been much better. I'm really glad to have that weird font to use, but will also check into "regex" to see what it is and if I can use it as well

05-03-2014, 06:12 PM	#55
DebbyS Zealot Posts: 115 Karma: 1472692 Join Date: Jul 2011 Location: Albuquerque, NM Device: Jetbook Lite; Samsung Galaxy Tab 2 (7.0)	For my current project, I did a search for "any digit"o [any digit + oh] so I could see if the "o" should be "0" (zero). The OCR was also italicizing words it shouldn't have, but it was largely extending italicized words in the Huichol and Spanish languages to the next few English words, so in the end I'll search for [blank] italicized to see if I missed any, as well as searching for [blank] [DPCustomMono] and trade that for Times New Roman. Accented "o" (oh) tends to become a "6", too, but that's easy to see. I'm sure if the font in the book had been larger than 10point or so, the accuracy of the OCR would have been much better. I'm really glad to have that weird font to use, but will also check into "regex" to see what it is and if I can use it as well