Thread: OCR engine
View Single Post
Old 03-20-2014, 07:19 PM   #6
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 3,055
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
Quote:
Originally Posted by AJ Starr View Post
...But, provided my scanned pdf is clear, the ocr'd text is about 95% accurate.
Maybe you're just guesstimating the accuracy, but 95% is not good. 95% for characters is terrible, and 95% for words is marginally acceptable. A typical printed page has something like 50 characters per line and 40 lines per page, so about 2000 characters per page. A 95% success rate per character would result in about 100 bad characters per page. A 95% success rate per word would bring that down to about 20 or 25 bad words per page. Even 99% accuracy produces more errors than most people like. You'd have to get to about 99.9% accuracy before you could think about not proofing the text afterwards.
rkomar is offline   Reply With Quote