View Single Post
Old 09-29-2014, 09:04 PM   #13
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by Ghitulescu View Post
I've seen a lot of scanned books in my life.
Frankly, I would rather type them by hand than to correct their spelling mistakes and/or paginations.

I believe a lot of the people that answered are English natives. Well, any OCR software can be trained to recognize 26 letters, but to non-ASCII users (like Bangla above) the errors a ten fold increased. For diacritics, it even be that scanning errors (like random black dots) may create a new character.

A good example of my opinion can be found in archive.org. Compare the PDF (scanned but a text layer) and the EPUB files.
Incompetent scanning and OCR will always result in poor-quality output. A good scanner, with competent OCR, can achieve a 99.995% rate. That's imperfect, but not bad. Of course, $1/scan and that ilk aren't going to give you a 99.995, because they're not running human A/B compares, which is, realistically, the only way to get to that level of quality. {shrug}.

I certainly would not consider typing a book instead of scanning it. No offense, but I find the idea crazy. Take a high-quality scan, a good A/B, run it through Toxaris' program, and you have a very, very high quality starting place.

The problem we see on these forums--all the time--is that nobody ever wants to do the "grunty" work of correcting the scanned material. Everybody wants a magic bullet. It doesn't exist.

Hitch
Hitch is offline   Reply With Quote