View Single Post
Old 03-20-2014, 08:37 AM   #12
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,557
Karma: 93980341
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by fjtorres View Post
The one where the scan is clean, aligned, and the software is good.

You're just looking for formatting artifacts to get a clean document, which is what a publisher needs as feedstock for their normal workflow.

It's doable and affordable.
Honest.
You can get "good" like that, but certainly not "perfect". I've done a heck of a lot of OCR with all sorts of different equipment from hobbyist scanners to pro kit. Even the best OCR packages don't claim an accuracy of more than about 99.99%, which means that, on average, 1 character in 10000 will be wrong - an average of 1 error in 5 pages, which equates to about 80 scanning errors in a 400 page book. Punctuation tends to be far less accurate than text - it's all too easy for a scanner to miss a comma or an apostrophe, or see dirt on the page as a spurious one.

I imagine you did your work with laser-printed documents? These are going to be a lot cleaner - and hence more accurately scanned - than grubby old paper books.

Last edited by HarryT; 03-20-2014 at 08:40 AM.
HarryT is offline   Reply With Quote