Quote:
Originally Posted by rumpumpel1
that's amazing: even for simple layouts and plain text there is a remaining error rate of one error per page ? What kind of errors are these ? Can they be corrected with a spell checker of a decent office program?
|
These are "normal" OCR errors, where the shapes of letters look similar, eg "clock" instead of "dock" ("cl" and "d" are very difficult for OCR to tell apart). A spell-checker won't help, because they are real words - just not the right word.
A decent OCR program has an accuracy rate of better than 99.9%, but a typical page has around 2000 characters on it, so that means about 2 character errors per page. Some of these the OCR program's spell-checker will fix for you, but some it will get wrong.