Quote:
Originally Posted by pwalker8
I find that odd formatting is the rule rather than the exception in books that I'm interested in scanning. The last book I scanned used a two column format. That took a long time to get right. Old paperbacks where the ink has blurred a bit and the paper has turned brown due to the paper used and age tend to have more scan errors that hardback books. I also find that the books I'm interested in scanning tend to be rather longer than the old 200 page pulp novels.
Do you scan 500+ ebooks a year? That's what we are actually talking about. A commercial ebook scanner that scans and edits ebooks day in and day out, not someone who does it on an occasional basis for fun. There are a number of people on this board who scan and edit books for fun. Perhaps some will chime in and say how long it takes them. I'm probably one of the slower workers here. I might be able to scan 10 pages in a hour and generate decent text, but it won't be formatted or proofed.
|
I'm not a professional, but I've scanned a number of ebooks for Distributed Proofreaders. A first pass at Jack of Shadows took me about an hour, but I expect that a second pass in Sigil to make me most mostly happy would take at least an hour more. But, as you noted, Jack of Shadows is a fairly short book. My guess is that a longer work, like Lord of Light, probably will take at least 8 hours. And I'm being lazy and assuming that I can spot most of the scan errors by just eyeballing the results. If I were doing this professionally, I'd probably do stuff like scan for common scannos, generate word lists, and maybe even use two different OCR tools and compare the differences.
I'd have to say that even a simple genre novel still needs careful work with the formatting. My favorite example of that is Norman Spinrad's The Mind Game. As near as I can tell, he took a darknet version of his novel, turned it into a Kindle book, and is now selling it on Amazon. I got it for free at one point. I was really disappointed with it, because unlike some other SF authors doing publishing from their darknet editions (Walter Jon Williams, for example), Spinrad apparently skipped the proofreading step. It was missing italics and short paragraphs that were made up of several one line quotations and maybe a he said at the beginning or end of a line were formatted into separate paragraphs for each line, which made it impossible to follow conversations. I posted a comment recommending not to buy the Kindle edition because of the crappy formatting, but as far I can tell, Spinrad's never fixed it.