Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 05-26-2008, 01:29 PM   #16
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by Nergal View Post
For the inital question: I recommend to have a look at tesseract ocr - it is an opensource command line tool - with an amazing recognition rate (95-99.9 %, mostly at 98-99% for me).
I am afraid Tesseract is not for me. I need some additional languages, I'd like a better accuracy, and the most important, I need a reasonable layout detection - the software must, at least, be able to detect paragraphs and store each on one line. That alone is worth the price difference for me. Thanks for the suggestion, though.
pepak is offline   Reply With Quote
Old 05-26-2008, 03:17 PM   #17
Nergal
eBuchReisender
Nergal doesn't litterNergal doesn't litterNergal doesn't litter
 
Nergal's Avatar
 
Posts: 41
Karma: 208
Join Date: May 2008
Location: Münster
Device: Palm Tungsten-E, iLiad
The paragraph detection is tricky with tesseract but (!) not complete hopeless, if the paragraphs are seperated by a blank line it might be detected and would be parsable as two linebreaks. Though if I understand you correctly, you do not want something to tinker with, but a solution that actually solves a task ).

I just made a test with a book page I scanned today (Caesar - Civil War, German) - 2124 Signs, only 2 signs which stood together were read falsely, scanned at 300 dpi.
So that is a rate of 99.91% (better than my typing .

Recently I had the chance to see ReadIris (for free with an HP-All-in One Device) - its layout detection was really horrible - very distinct columns have been overlooked and a lot of simple misreadings.

I had some years ago AbbyyFineReader 8.0 Trial and must admit I was a bit disappointed at the automagical layout-detection, quite a lot of manual editing. Hopefully this works better by now.

IIRC they offer an educational discount ... - if I wasn't a Linux-addicted at that time I surely would have bought it because of its fantastic recognition rate, except for text written in italics.

a bt OT: Their language support was/is awesome on the other hand: scanned in a Russian article, parsed it through babelfish and got at least a vague idea what the author had written, whithout knowing much more than 'spassibo' of Russian language by myself

Good Luck!

Nergal
Nergal is offline   Reply With Quote
Old 05-26-2008, 05:30 PM   #18
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
Saw just today on Teleread: Cuneiform open-sourced their OCR:
http://www.cuneiform.ru/eng/index.html
igorsk is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Ebook readers - should you OCR or not? crackhammer Calibre 13 09-06-2010 02:32 AM
OCR Software Help kpfeifle Workshop 5 03-01-2010 02:27 PM
Unutterably Silly Memorable OCR errors Patricia Lounge 4 02-16-2010 02:53 PM
OCR help needed Nate the great Workshop 7 09-21-2009 11:21 PM
What is an OCR Cradle? JackieFrost Which one should I buy? 4 05-21-2008 08:10 PM


All times are GMT -4. The time now is 10:00 PM.


MobileRead.com is a privately owned, operated and funded community.