I once OCR:d a book that had half the text in Swedish, and half the text in Russian.
Swedish had 3 extra characters and Russian is a Cyrillic language. To add to the challenge, the book was full of challenging graphics and pictures.
It needed a lot of hands-on corrections. But it DID work.
Abbyy is a Russian company actually - and they are very international in their outlook. I was surprised at how good Abbyy was at Swedish.
First of all you tell it what languages the text is in, then you have to manually "teach" it to recognise characters it's unfamiliar with, i.e. italics makes it harder for the recognition, as does any fancy/pretty fonts. OCR likes Arial and Times New Roman non-bold, non-italic.
Last edited by martienne; 07-11-2014 at 08:14 PM.
|