Quote:
Originally Posted by jangell2
I'm gong to send of a pocket book to one of the scan services and have it returned as a word doc. Can anyone give me an idea about the level of effort required to proof it? What are the typical errors that will be found. Do I have to read the whole book? Will the Word Add-In catch most of the problems?
|
I'll let Tox address what the Word add-in will catch, but as a commercial formatter of eBooks, I can tell you that the effort to proof a scan--particularly if you're using one of the cheaper scanning services--is fairly significant. In some ways, it's far harder than editing/proofing the book the first time around, because you have to proof the Word file against the PDF, to ensure that you don't have scanning errata (typical is, for example, "hat" for "fiat," and other errors like that, particularly surrounding ligatures).
You'll have a lot of text-sizing errors, which are represented in the output HTML as
spans. Lots, and
lots, and
lots of
spans. (DUH, corrected, thank you, Peter!).
Then you'll have the fairly endless broken paragraph and page-ending errors; those are ubiquitous. The Word add-in does a pretty good job at finding possible broken paragraphs, particularly those broken mid-sentence.
What it can't do--and no automated system can--is find those paragraphs that have the end of one sentence at or near the right-hand-margin on one page, and continue, flush-left, with a capitalized first letter at the top of the next page. Only human reading and decision-making can handle those.
This--this very thing, proofing post-scan Word files--is the single biggest obstacle that we have with authors/publishers doing their backlists into eBooks. NONE of them want to do this step. ALL of them think that either a) this should be the scanning company's job, or b) this should be the formatting company's job. None of them are willing to accept that as publishers, it's THEIR job.
The single biggest glitch is that they have zero interest in learning the underpinnings of Word. Most of them don't know how to see pilcrows, much less figure out page breaks versus section breaks, etc. The scanning companies don't want to proof for that type of errata, (broken paragraphs and the like) and we certainly don't; we're not editors or proofreaders.
(Sorry, I digress). Anyway, that's what you should expect. Tox, who has done thousands of scans, will likely have more feedback, but that's at the top of what I see. Broken paragraphs; section/page breaks that have to be removed; paragraphs that may or may not be breaking across pages; typical scan OCR errors; and font/text-sizing errors.
Oh, yes: you absolutely have to proofread the whole thing.
Offered FWIW.
Hitch