Thread: OCR to use
View Single Post
Old 05-26-2008, 03:17 PM   #17
Nergal
eBuchReisender
Nergal doesn't litterNergal doesn't litterNergal doesn't litter
 
Nergal's Avatar
 
Posts: 41
Karma: 208
Join Date: May 2008
Location: Münster
Device: Palm Tungsten-E, iLiad
The paragraph detection is tricky with tesseract but (!) not complete hopeless, if the paragraphs are seperated by a blank line it might be detected and would be parsable as two linebreaks. Though if I understand you correctly, you do not want something to tinker with, but a solution that actually solves a task ).

I just made a test with a book page I scanned today (Caesar - Civil War, German) - 2124 Signs, only 2 signs which stood together were read falsely, scanned at 300 dpi.
So that is a rate of 99.91% (better than my typing .

Recently I had the chance to see ReadIris (for free with an HP-All-in One Device) - its layout detection was really horrible - very distinct columns have been overlooked and a lot of simple misreadings.

I had some years ago AbbyyFineReader 8.0 Trial and must admit I was a bit disappointed at the automagical layout-detection, quite a lot of manual editing. Hopefully this works better by now.

IIRC they offer an educational discount ... - if I wasn't a Linux-addicted at that time I surely would have bought it because of its fantastic recognition rate, except for text written in italics.

a bt OT: Their language support was/is awesome on the other hand: scanned in a Russian article, parsed it through babelfish and got at least a vague idea what the author had written, whithout knowing much more than 'spassibo' of Russian language by myself

Good Luck!

Nergal
Nergal is offline   Reply With Quote