View Single Post
Old 06-30-2011, 05:28 PM   #3
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,140
Karma: 23571382
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Clié; PRS-505; EZR Pocket Pro, PRS-600, Kobo Mini
For regular, clean-font books, it's over 99% accurate; often 2-4 pages between errors if the scans are good, and those tend to be things like names that aren't in its internal dictionaries. And that's FR 7; 10 should be more accurate because OCR tech has improved in the last six years.

Note: 99% accuracy is one character wrong every couple of sentences. It's a lot better than that. 99.9 is still 1 character wrong per page.

I'm attaching a sample from a public-domain book that I'm in the process of converting. (I'm done with the main text; am trying to decide how/whether to deal with the index.)

This was scanned at 600dpi, so the scans are good, but the font is older than current books & has a tighter line-spacing than a lot of modern books. 10-pg PDF extract, and Word output from FineReader 7 with no corrections & keeping all the auto-detected formatting.

Normally, I wouldn't keep the line breaks. I'd also try to remove the headers & page numbers before OCR; I've got various ways to do that but FR 10 might have better ones. And I'd run through FineReader's internal correction process, which is easier to deal with than comparing the Word export to the PDF and making changes that way.
Attached Files
File Type: pdf CONANT_sample pgs.pdf (947.9 KB, 79 views)
File Type: doc CONANT_sample pgs.doc (31.7 KB, 72 views)

Last edited by Elfwreck; 06-30-2011 at 05:31 PM.
Elfwreck is offline   Reply With Quote