View Single Post
Old 07-27-2011, 10:14 AM   #10
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
I never set it on automatically processing. After reading the pdf, I do the automatic scanning. I then check each page to see if the text area is correct. Sometimes (especially with older books) some parts are seen as images.
Then I *always* use training patterns. I train for about two pages and then let Abbyy process the rest. After finishing I transfer the lot to Word (without headers and footers, never missed a beat there, except for the occasional page number) and run some macro's there to correct a lot of default errors. Then I do the spelling control and check the layout. I don't worry too much about the layout, since I will use stylesheets anyway.
When this is finished, I make it into a HTML file. Either by a macro or via 'filtered HTML' and load it into Sigil. There I do my final work.
Depending on the book, it usually takes me about 4-6 hours for a normal novel. When I read the book, I proofread it and fix the final issues. Usually about 4-10 per novel.

Most things you mention are 'normal' and expected OCR faults. Another one is incorrectly identify paragraphs. That is why I export to Word with linebreaks intact. The macro I use will transform them back to paragraphs. When I check the layout, I also check for sentences in a paragraph which just happens to end exactly on a line.
Toxaris is offline   Reply With Quote