Quote:
Originally Posted by retiredbiker
I do multi-column old magazine stories, the pdf coming from, say, Internet Archive. Any text that is already in these is worthless, it would take forever to correct it by hand.
|
It's almost like my use cases, though my sources for pulps and weird fiction are often more or less proofread - it's just that they have also attempted to replicate the original's layout, which makes them a pain to read on a e-reader... and often it's the only source available.
I have used gImagereader to OCR a couple of sources where the source was only avaible as a scan (as you noted, Archive's TXT or EPUB versions are often worthless), but only for short texts. I'll have to check out OCRFeeder.