Thread: OCR engine
View Single Post
Old 04-06-2014, 01:10 PM   #37
alecE
Evangelist
alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.
 
alecE's Avatar
 
Posts: 412
Karma: 546196
Join Date: Mar 2009
Location: UK canal boat
Device: sony prs505, prs650, kobo Glo HD liseuses
I buy old paperbacks specifically for destructive ebook creation. Covers are removed and the book split into the publisher's signatures. Then the glue/gutter is removed. I use a Canon P-150 scanner which feeds automatically and does both sides. Abbyy Fine Reader works extremely well for the OCR process. However OCR cannot make a complete success - words hyphenated over two pages, phrases in italics, poor quality original typescript etc., all require an extended bout of editing.
I use Notepad++ for the basic editing, converting the text to html, amending quotation marks, correcting capitalisation and paragraphing (my regex skills are s l o w l y improving).
Finally Sigil for the ebook creation, application of css, spell check etc.

Over the last 70+ books I've treated like this, my average time to completion has been just short of 10 hours. However, a thorough read through in 'recreational' mode will then reveal all the little things I've missed - so another hour or so in Sigil after the first read-through.
alecE is offline   Reply With Quote