Quote:
Originally Posted by Xenophon
And indeed, they do farm it out. It turns out, however, that getting clean copy is more difficult than it might seem. Arnold Bailey (the web-guy who takes care of these things for Baen) has been through a bunch of different contractors/services/etc. for scanning/OCR/checking. I think he has some reliable choices now, but I seem to recall that it took a bunch of tries to results good enough for publication.
It certainly isn't impossible, but neither is it so cheap and easy as you make it sound. Unless, of course, you don't mind having one or more OCR errors per page.
If you want to learn more than you knew there was to learn about OCR and OCR errors, go hang out on the Distributed Proofreaders fora.
Xenophon
|
If you look at Gutenberg text you will still find quite a few errors in the text. It is certainly not easy to find them all. Someone (probably more that one person) has to basically read the entire text and has to be alert to the kinds of errors that appear and in many cases you also need to have the original book present and go look it up. There are lots more problems in text that just a simple letter substitution in my experience.
Dale