MobileRead Forums - View Single Post - how to convert a scanned page from a book (looks like photo of page) to clean text?

DSpider · 11-23-2012, 03:22 PM

Quote:

Originally Posted by GMcG

@neuvivlio

If he book is already scanned and you have a pdf file, then why can't you open it in ACROBAT reader and save it as txt?
(File --> save as txt)?

George

Because they are basically JPG images. While some scanners may apply some half-assed OCR underneath those images ("positional OCR"), it's way too inferior compared to ABBYY FineReader. Adobe Acrobat can OCR it, as well, but it has a very poor engine backing it up.

Also, saving it as plain text is just awful for e-books, because there's absolutely no formatting at all (italics, bolds, chapter titles, etc). Italics are the soul of a book, and it's what makes the reading experience enjoyable - especially if used right. Trying to manually spot them in the scans, and then manually re-add them is pure madness. You're bound to miss a few, unless you spend a SIGNIFICANT amount of mental effort and you go over them at least twice.