Quote:
Originally Posted by fjtorres
The one where the scan is clean, aligned, and the software is good.
You're just looking for formatting artifacts to get a clean document, which is what a publisher needs as feedstock for their normal workflow.
It's doable and affordable.
Honest.
|
You can get "good" like that, but certainly not "perfect". I've done a heck of a lot of OCR with all sorts of different equipment from hobbyist scanners to pro kit. Even the best OCR packages don't claim an accuracy of more than about 99.99%, which means that, on average, 1 character in 10000 will be wrong - an average of 1 error in 5 pages, which equates to about 80 scanning errors in a 400 page book. Punctuation tends to be far less accurate than text - it's all too easy for a scanner to miss a comma or an apostrophe, or see dirt on the page as a spurious one.
I imagine you did your work with laser-printed documents? These are going to be a lot cleaner - and hence more accurately scanned - than grubby old paper books.