MobileRead Forums - View Single Post

Mike L · 12-19-2009, 04:51 AM

Ficbot,

I would've thought that OCR software worked in exactly the same way, regardless of the language. It looks at each character separately, and tries to determine which letter or symbol it represents. It doesn't know anything about words or sentences or meanings. It justs converts shapes to letters, etc.

So the fact the book was partly in French and partly in English is probably irrelevant. More likely, either the software is poor or the original printed pages are difficult to read for some reason.

To determine which part of the system isn't working properly, try eliminating each variable in turn. Start by scanning an image. Does the result look like the original? If so, the scanner itself is probably OK. Next, try scanning a simple page of text, with a single clear font. If the OCR fails to convert it, then its the software that's at fault.

Finally, if you can get access to a different type of scanner, test it with the English / French book that was causing the problem. If the results are still bad, that suggests that the problem lies in the quality of printed page, or perhaps in the fonts.

I hope you manage to find a solution.

12-19-2009, 04:51 AM	#3
Mike L Wizard Posts: 1,479 Karma: 3846231 Join Date: Apr 2009 Location: Edinburgh, Scotland Device: Kindle 3, Samsung Galaxy	Ficbot, I would've thought that OCR software worked in exactly the same way, regardless of the language. It looks at each character separately, and tries to determine which letter or symbol it represents. It doesn't know anything about words or sentences or meanings. It justs converts shapes to letters, etc. So the fact the book was partly in French and partly in English is probably irrelevant. More likely, either the software is poor or the original printed pages are difficult to read for some reason. To determine which part of the system isn't working properly, try eliminating each variable in turn. Start by scanning an image. Does the result look like the original? If so, the scanner itself is probably OK. Next, try scanning a simple page of text, with a single clear font. If the OCR fails to convert it, then its the software that's at fault. Finally, if you can get access to a different type of scanner, test it with the English / French book that was causing the problem. If the results are still bad, that suggests that the problem lies in the quality of printed page, or perhaps in the fonts. I hope you manage to find a solution.