MobileRead Forums - View Single Post

Lukusaukko · 02-24-2023, 10:43 AM

Quote:

Originally Posted by retiredbiker

I do multi-column old magazine stories, the pdf coming from, say, Internet Archive. Any text that is already in these is worthless, it would take forever to correct it by hand.

It's almost like my use cases, though my sources for pulps and weird fiction are often more or less proofread - it's just that they have also attempted to replicate the original's layout, which makes them a pain to read on a e-reader... and often it's the only source available.

I have used gImagereader to OCR a couple of sources where the source was only avaible as a scan (as you noted, Archive's TXT or EPUB versions are often worthless), but only for short texts. I'll have to check out OCRFeeder.