Quote:
Originally Posted by AJ Starr
...But, provided my scanned pdf is clear, the ocr'd text is about 95% accurate.
|
Maybe you're just guesstimating the accuracy, but 95% is not good. 95% for characters is terrible, and 95% for words is marginally acceptable. A typical printed page has something like 50 characters per line and 40 lines per page, so about 2000 characters per page. A 95% success rate per character would result in about 100 bad characters per page. A 95% success rate per word would bring that down to about 20 or 25 bad words per page. Even 99% accuracy produces more errors than most people like. You'd have to get to about 99.9% accuracy before you could think about not proofing the text afterwards.