View Single Post
Old 02-05-2015, 07:32 PM   #11
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by HarryT View Post
These are "normal" OCR errors, where the shapes of letters look similar, eg "clock" instead of "dock" ("cl" and "d" are very difficult for OCR to tell apart). A spell-checker won't help, because they are real words - just not the right word.

A decent OCR program has an accuracy rate of better than 99.9%, but a typical page has around 2000 characters on it, so that means about 2 character errors per page. Some of these the OCR program's spell-checker will fix for you, but some it will get wrong.
Exactly. Given how well-used Abbyy is, it's almost ALWAYS things like hat for fiat, and the like. No spellchecker will find those. And formatting errors? A BOATLOAD more than 1-2 per page. Almost all the "work" is in fixing the formatting, cleaning up the text, removing spans, and all that.

This part, from rumpumple1:

Quote:
...and to convert them to epub with little effort
Simply doesn't exist. There's no such thing. The only material that can be "convert[ed] to epub with little effort" are those files that are clean to begin with--which means, what you've done between the Scan/OCR and the time you start to actually make the ePUB.

Hitch
Hitch is offline   Reply With Quote