MobileRead Forums - View Single Post

A4- · 12-19-2009, 04:33 AM

whether its 1, 2 or 3 (or a combo) I cant tell with just a vague description, but from the looks of it at least part of it is related to the ocr-software.

I've recently ocr-ed a screenshot with Dutch text on it with abbyy english, and it made all sorts of weird faults. What that program does is make a decent guess and then run it trough a sort of dictionary, so with English and French mixed text abbyy isn't gonna work well.
In the early days of ocr, ocr-software made a guess and if it wasn't sure you had to teach it what the letter/symbol was. I assume that kind of software will work a lot better in your case. What software that would be I don't know tho. I haven't had to ocr anything in at least a decade...

about the scans:
- high resolution, low/no compression, high contrast, and straight/horizontal lines all reduce ocr-faults. Some of this you might need to fix depending on your scan- and ocr-results. And for text you don't need color ...

gl

12-19-2009, 04:33 AM	#2
A4- Connoisseur Posts: 84 Karma: 1110 Join Date: Aug 2009 Location: Netherlands Device: iRex iLiad v2	whether its 1, 2 or 3 (or a combo) I cant tell with just a vague description, but from the looks of it at least part of it is related to the ocr-software. I've recently ocr-ed a screenshot with Dutch text on it with abbyy english, and it made all sorts of weird faults. What that program does is make a decent guess and then run it trough a sort of dictionary, so with English and French mixed text abbyy isn't gonna work well. In the early days of ocr, ocr-software made a guess and if it wasn't sure you had to teach it what the letter/symbol was. I assume that kind of software will work a lot better in your case. What software that would be I don't know tho. I haven't had to ocr anything in at least a decade... about the scans: - high resolution, low/no compression, high contrast, and straight/horizontal lines all reduce ocr-faults. Some of this you might need to fix depending on your scan- and ocr-results. And for text you don't need color ... gl