Any options besides PDF for mixed language documents?
I picked up a cheap scanner today and I am disappointed. I tried scanning a paperback book in English, and it did a terrible job, lots of weird symbols all over the place. Then I tried the teaching guides which are the main reason I wanted the scanner. What a mess! It seems the problem is that the text is half in French and half in English (e.g. it has prompts in English telling you what to say in French to the kids, for example "say 'je suis ici' while pointing at yourself.") So when I set the scanner to OCR mode and the language was English, I got gibberish. When I set it to French, things improved a little and it got much of it, but the text still needed a lot of cleaning up.
I thought maybe it was just that the software which came with the scanner was not that great. So I downloaded a few utilities which claim to extract text from PDFs. They had great reviews. They totally choked on the French parts.
The PDF looks fine (I made a two-page sampler for testing purposes), but displays a bit too small for easy reading on the Sony. I uploaded it as a PDF, LRF and epub separately. The epub could not zoom at all (i.e. the page stayed looking the same no matter what). The LRF looked just like the PDF on lowest zoom but when I tried to zoom in, the text got garbled as it had when I tried to extract it from the PDF.
So, there are three possibilities here:
1) The scanner is not that great
2) The scanner is fine and I just need better software
3) Dual-language files are too hard and I am stuck with PDF
What do you think? Is there anything I can do here, or will I go to all this work just to wind up with itty bitty text in a PDF file? If so, it may not be worth scanning them all...
|