Thread: OCR engine
View Single Post
Old 03-22-2014, 10:45 AM   #17
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,560
Karma: 93980341
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by Tex2002ans View Post
Ok, perhaps if we are using your definition of "complete proof". (character by character A/B compare, ... there is just no economically feasible way to do this).

I mean going through multiple thorough rounds of successive Formatting/Quality Checking... Applying/searching different fixes each round (Spellcheck, ligatures/accented characters, consistent hyphenation, punctuation errors, inconsistent spelling, etc. etc.)

Feel free to look at any of my EPUBs and let me know of errors. While probably not "100%" error free, the amount of errors can probably be counted on one hand.
Your method will certainly produce a good reading copy. What it won't do - and this is very important - is find missing text. You'd probably be surprised how many books I've proof-read where the scanner has missed text at the top or the bottom of the page, or even completely missed out a double page of text, and it's not always at all obvious from simply reading the text that this has taken place. That's why it's so important to compare to the original, and not simply take the OCR'd text in isolation.

Last edited by HarryT; 03-22-2014 at 10:50 AM.
HarryT is offline   Reply With Quote