I never set it on automatically processing. After reading the pdf, I do the automatic scanning. I then check each page to see if the text area is correct. Sometimes (especially with older books) some parts are seen as images.
Then I *always* use training patterns. I train for about two pages and then let Abbyy process the rest. After finishing I transfer the lot to Word (without headers and footers, never missed a beat there, except for the occasional page number) and run some macro's there to correct a lot of default errors. Then I do the spelling control and check the layout. I don't worry too much about the layout, since I will use stylesheets anyway.
When this is finished, I make it into a HTML file. Either by a macro or via 'filtered HTML' and load it into Sigil. There I do my final work.
Depending on the book, it usually takes me about 4-6 hours for a normal novel. When I read the book, I proofread it and fix the final issues. Usually about 4-10 per novel.
Most things you mention are 'normal' and expected OCR faults. Another one is incorrectly identify paragraphs. That is why I export to Word with linebreaks intact. The macro I use will transform them back to paragraphs. When I check the layout, I also check for sentences in a paragraph which just happens to end exactly on a line.
|