Quote:
Originally Posted by HarryT
These are "normal" OCR errors, where the shapes of letters look similar, eg "clock" instead of "dock" ("cl" and "d" are very difficult for OCR to tell apart). A spell-checker won't help, because they are real words - just not the right word.
A decent OCR program has an accuracy rate of better than 99.9%, but a typical page has around 2000 characters on it, so that means about 2 character errors per page. Some of these the OCR program's spell-checker will fix for you, but some it will get wrong.
|
Exactly. Given how well-used Abbyy is, it's almost ALWAYS things like hat for fiat, and the like. No spellchecker will find those. And formatting errors? A BOATLOAD more than 1-2 per page. Almost all the "work" is in fixing the formatting, cleaning up the text, removing spans, and all that.
This part, from rumpumple1:
Quote:
...and to convert them to epub with little effort
|
Simply doesn't exist. There's no such thing. The only material that can be "convert[ed] to epub with little effort" are those files that
are clean to begin with--which means,
what you've done between the Scan/OCR and the time you start to actually make the ePUB.
Hitch