View Single Post
Old 09-01-2010, 02:03 PM   #101
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
Quote:
Originally Posted by HarleyB View Post
OK so for a typical fiction novel that is somewhere between 300 and 500 pages how long is it going to take me to scan it and turn it into an ePub? Can anyone who routinely does this share? I expect I would be a bit slower to start with but a rough guide would be fine.

This is a serious question as I have never considered doing this as I imagined it would take forever.
Scanning: Chop cover off, less than 1 hr scanning with a document-feeding, duplex scanner. (Scanning on a flatbed scanner: assume 4 pgs per minute with a *fast* scanner and an efficient setup.) I use 400 dpi for books I plan to OCR, unless they've got lots of footnotes, in which case I use 600 dpi.

Deskew, despeckle, otherwise fix scanned images: less than 10 minutes if (1) you've set up the scanning well, (2) you have the right software and (3) you really know what you're doing. Or skip this step & add time to later steps.

OCR: Auto-process: less than an hour with decent (cheap, but not free) software.
Fixing OCR errors: 1-50 hours, depending on a wide set of variables, including target audience (how much do you care if "corn cob" reads as "com cob?"), scan quality (somewhat adjustable, subject to limits of scanning device), typesetting in original ("fancy" fonts are hard to OCR), software being used (easier & faster to correct in FineReader Pro than in Word while flipping through physical pages).

Output & Formatting into ebook: 10 minutes to 10+ hours, depending on how much you care about formatting and what software you're using.

For me personally:
Chop-scan: under 1 hour and I don't care; I'm reading while the machine is feeding pages.
OCR zoning: ~10 minutes to set up so I'm not OCRing the chapter headers & page numbers.
OCR processing: Who cares; I'm watching TV for that part.
OCR Correction: 1/2 hour doing search-and-replace for very common errors (1 for I; arid for and).
Output to RTF.
Remove page breaks; set whole thing to 14 pt Times; save onto Ebook reader & ignore OCR glitches.

Under an hour of pay-attention processing, for a book not suitable for sharing with anyone else even if I were allowed to do so.

For books I've read before & intend to re-read, I'll spend much more time on OCR correction; I don't for new books because I'd rather deal with punctuation errors than spoil the story for myself.
Elfwreck is offline   Reply With Quote