I scan perhaps 24 old paperbacks per year, partly to ease space restrictions, partly to provide durable copies of paperbacks that are falling apart. Standard procedure is:
- split book into 32-page signatures
- scan signatures using Canon P150 duplex sheet scanner
- ocr the output using Abbyy Fine Reader
- correct the obvious typos in Abbyy, mark where sentences run over from one page to the next, identify required and redundant hyphens at page breaks and then output to .txt file
- further editing in Notepad++ (systematic treatment of speech marks, ellipses, endashes, ligatures, apostrophes... specify the language as HTML and use entities...add markers to divide the text into chapters...use regex to tidy line breaks etc
- construct an empty e-book in Sigil using standard components (pre-canned title-page, front-matter, chapter styles, pre-defined css sheets)
- move text from Notepad++ to Sigil, add italics, additional breaks etc plus further styling
- create a "nice" front cover (usually scanned from the original cover) and add that
- add to e-reader via Calibre
Generally reckon from 4 to 10 hours work per e-book, depending on the complexity and the quality of the ocr process. Feel the work is justified as I gain digital copies of titles which are unlikely to digitised by the publisher, at least in my lifetime!
|