MobileRead Forums - View Single Post - Converting a scanned book from 1DollarScan to ePub

Hitch · 09-29-2014, 09:04 PM

Quote:

Originally Posted by Ghitulescu

I've seen a lot of scanned books in my life.
Frankly, I would rather type them by hand than to correct their spelling mistakes and/or paginations.

I believe a lot of the people that answered are English natives. Well, any OCR software can be trained to recognize 26 letters, but to non-ASCII users (like Bangla above) the errors a ten fold increased. For diacritics, it even be that scanning errors (like random black dots) may create a new character.

A good example of my opinion can be found in archive.org. Compare the PDF (scanned but a text layer) and the EPUB files.

Incompetent scanning and OCR will always result in poor-quality output. A good scanner, with competent OCR, can achieve a 99.995% rate. That's imperfect, but not bad. Of course, $1/scan and that ilk aren't going to give you a 99.995, because they're not running human A/B compares, which is, realistically, the only way to get to that level of quality. {shrug}.

I certainly would not consider typing a book instead of scanning it. No offense, but I find the idea crazy. Take a high-quality scan, a good A/B, run it through Toxaris' program, and you have a very, very high quality starting place.

The problem we see on these forums--all the time--is that nobody ever wants to do the "grunty" work of correcting the scanned material. Everybody wants a magic bullet. It doesn't exist.

Hitch