View Single Post
Old 01-30-2010, 12:15 AM   #9
Solitaire1
Samurai Lizard
Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.
 
Solitaire1's Avatar
 
Posts: 14,943
Karma: 69500000
Join Date: Nov 2009
Device: NookColor, Nook Glowlight 4
Quote:
Originally Posted by Kolenka View Post
I think it has a bit to do with the source of where some of the e-book library is coming from: scanning.

A publisher likely doesn't actually do the right thing and keep digital copies of books around for very long, or never had them. So they have to pay someone to scan, OCR, and proofread the scans. Odds are the guy(s) doing this aren't actual editors, and their mind likely goes numb partway through each book and they just aren't catching the errors.

I know every time I scan a book, I read through it twice before I'm confident I've caught /most/ of the errors. I'm betting these scans only go through one read in the OCR software.
If the publisher isn't going to maintain the electronic version of the book, then a printed version that is easy to scan should be retained as an archive copy. It should be printed and formatted in a way that is easy for a scanner to accurately read (printed in a OCR font like "OCR-B" [where each character is distinctly different from any other character], each paragraph separated by a blank line, and includes annotations to clarify formatting where needed).

I suspect one of the reasons that errors slip through is that the original book was printed in a typeface that isn't very easy to accurately scan, or is too small to be easily read. One of the reasons I prefer to read in a serif font is that it is easy to tell the difference between the letters (in some san-serif fonts it is easy to confuse a "1" and a lower case "L" or an "O" and a "0" [the number zero]).
Solitaire1 is offline   Reply With Quote