Yet another common OCR bugaboo is reading the pair cl as a lowercase d. If OCR software just had a list of such common goofs and lists of words that often have recognition problems, then present them to the user with the surrounding words for correction, that would help a lot.
Another one I've seen a lot of is turning italic sans-serif uppercase I and lowercase l into forward slashes.
Much of it depends on the quality of the paper, which affects how much the ink spreads, but fer cripes sake, there shouldn't be a mixup between rn and m if the software simply compared the width of rn VS m. I don't see how it should be possible for it to see a lowercase m as rn, especially not with 100% of the lowercase m's in a book as I've seen a few times, and in the same ones every instance of rn was rendered as an m.
As for any device or OS that doesn't support unicode, it's not "defective", it just doesn't support unicode. It's also highly unlikely any such will ever get unicode support. Therefore e-book creation software should have the OPTION to create non-unicode output when making an e-book for reader software such as Mobipocket or TealDoc which has a version for those platforms.
|