whether its 1, 2 or 3 (or a combo) I cant tell with just a vague description, but from the looks of it at least part of it is related to the ocr-software.
I've recently ocr-ed a screenshot with Dutch text on it with abbyy english, and it made all sorts of weird faults. What that program does is make a decent guess and then run it trough a sort of dictionary, so with English and French mixed text abbyy isn't gonna work well.
In the early days of ocr, ocr-software made a guess and if it wasn't sure you had to teach it what the letter/symbol was. I assume that kind of software will work a lot better in your case. What software that would be I don't know tho. I haven't had to ocr anything in at least a decade...
about the scans:
- high resolution, low/no compression, high contrast, and straight/horizontal lines all reduce ocr-faults. Some of this you might need to fix depending on your scan- and ocr-results. And for text you don't need color ...
gl
|