12-18-2009, 03:23 AM | #1 |
Wizard
Posts: 2,409
Karma: 4132096
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
|
Any options besides PDF for mixed language documents?
I picked up a cheap scanner today and I am disappointed. I tried scanning a paperback book in English, and it did a terrible job, lots of weird symbols all over the place. Then I tried the teaching guides which are the main reason I wanted the scanner. What a mess! It seems the problem is that the text is half in French and half in English (e.g. it has prompts in English telling you what to say in French to the kids, for example "say 'je suis ici' while pointing at yourself.") So when I set the scanner to OCR mode and the language was English, I got gibberish. When I set it to French, things improved a little and it got much of it, but the text still needed a lot of cleaning up.
I thought maybe it was just that the software which came with the scanner was not that great. So I downloaded a few utilities which claim to extract text from PDFs. They had great reviews. They totally choked on the French parts. The PDF looks fine (I made a two-page sampler for testing purposes), but displays a bit too small for easy reading on the Sony. I uploaded it as a PDF, LRF and epub separately. The epub could not zoom at all (i.e. the page stayed looking the same no matter what). The LRF looked just like the PDF on lowest zoom but when I tried to zoom in, the text got garbled as it had when I tried to extract it from the PDF. So, there are three possibilities here: 1) The scanner is not that great 2) The scanner is fine and I just need better software 3) Dual-language files are too hard and I am stuck with PDF What do you think? Is there anything I can do here, or will I go to all this work just to wind up with itty bitty text in a PDF file? If so, it may not be worth scanning them all... |
12-19-2009, 06:06 PM | #2 |
Punctuation Fetishist
Posts: 557
Karma: 1070000
Join Date: Nov 2008
Location: The Bluest Commonwealth In East America
Device: Kindle PW, Nexus 7 (2013), Galaxy S5 phone, Galaxy Tab 4 8.0
|
Probably 3. Maybe 2. All the scanner does is make a picture, so if you can see the picture, 1 is not it.
OCR software tries to recognize characters from the picture, and then turn them into words from a dictionary. You need two dictionaries, and some kind of referee to tell the OCR which one to use. There may not be such a thing. Good Luck, Jack Tingle |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF conversion options | tomsem | Calibre | 0 | 05-04-2010 07:22 PM |
PDF zoom options too coarse | jusmee | Astak EZReader | 11 | 03-12-2010 10:41 PM |
best foreign language & dictionary options? | joedevivre | Which one should I buy? | 2 | 12-13-2009 09:40 AM |
Advanced options for PDF files | npavkovic | Sony Reader Dev Corner | 5 | 02-22-2008 12:53 AM |
iLiad Mobipocket problem with documents in Korean language. | wagnerian | iRex Developer's Corner | 0 | 07-14-2007 03:49 PM |