I buy old paperbacks specifically for destructive ebook creation. Covers are removed and the book split into the publisher's signatures. Then the glue/gutter is removed. I use a Canon P-150 scanner which feeds automatically and does both sides. Abbyy Fine Reader works extremely well for the OCR process. However OCR cannot make a complete success - words hyphenated over two pages, phrases in italics, poor quality original typescript etc., all require an extended bout of editing.
I use Notepad++ for the basic editing, converting the text to html, amending quotation marks, correcting capitalisation and paragraphing (my regex skills are s l o w l y improving).
Finally Sigil for the ebook creation, application of css, spell check etc.
Over the last 70+ books I've treated like this, my average time to completion has been just short of 10 hours. However, a thorough read through in 'recreational' mode will then reveal all the little things I've missed - so another hour or so in Sigil after the first read-through.
|