05-26-2008, 01:29 PM | #16 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
I am afraid Tesseract is not for me. I need some additional languages, I'd like a better accuracy, and the most important, I need a reasonable layout detection - the software must, at least, be able to detect paragraphs and store each on one line. That alone is worth the price difference for me. Thanks for the suggestion, though.
|
05-26-2008, 03:17 PM | #17 |
eBuchReisender
Posts: 41
Karma: 208
Join Date: May 2008
Location: Münster
Device: Palm Tungsten-E, iLiad
|
The paragraph detection is tricky with tesseract but (!) not complete hopeless, if the paragraphs are seperated by a blank line it might be detected and would be parsable as two linebreaks. Though if I understand you correctly, you do not want something to tinker with, but a solution that actually solves a task ).
I just made a test with a book page I scanned today (Caesar - Civil War, German) - 2124 Signs, only 2 signs which stood together were read falsely, scanned at 300 dpi. So that is a rate of 99.91% (better than my typing . Recently I had the chance to see ReadIris (for free with an HP-All-in One Device) - its layout detection was really horrible - very distinct columns have been overlooked and a lot of simple misreadings. I had some years ago AbbyyFineReader 8.0 Trial and must admit I was a bit disappointed at the automagical layout-detection, quite a lot of manual editing. Hopefully this works better by now. IIRC they offer an educational discount ... - if I wasn't a Linux-addicted at that time I surely would have bought it because of its fantastic recognition rate, except for text written in italics. a bt OT: Their language support was/is awesome on the other hand: scanned in a Russian article, parsed it through babelfish and got at least a vague idea what the author had written, whithout knowing much more than 'spassibo' of Russian language by myself Good Luck! Nergal |
Advert | |
|
05-26-2008, 05:30 PM | #18 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
Saw just today on Teleread: Cuneiform open-sourced their OCR:
http://www.cuneiform.ru/eng/index.html |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Ebook readers - should you OCR or not? | crackhammer | Calibre | 13 | 09-06-2010 02:32 AM |
OCR Software Help | kpfeifle | Workshop | 5 | 03-01-2010 02:27 PM |
Unutterably Silly Memorable OCR errors | Patricia | Lounge | 4 | 02-16-2010 02:53 PM |
OCR help needed | Nate the great | Workshop | 7 | 09-21-2009 11:21 PM |
What is an OCR Cradle? | JackieFrost | Which one should I buy? | 4 | 05-21-2008 08:10 PM |