Quote:
Originally Posted by susan_cassidy
Some ebooks have obviously not even had spell-check run on them. I've also seen lots of errors like spaces after commas, no spaces after periods, etc. where those combinations of characters should never happen.
|
Those can happen from the OCR program. Yes, they should run a few macros/scripts to clean them up.
1 hour per novel-length book should catch 90% of the small OCR errors, like weird line breaks, punctuation spacing issues, and the occasional paragraph that ends in a comma instead of a period because there was a dust speck on the page. It won't make for perfect files, but it'd get rid of the glaring errors that even the most casual readers will notice and be annoyed by.
Quote:
Maybe they need a script that looks at all the commonly misused homonyms and other words and displays the context for a human to double-check? Like "flair" vs. "flare", "break" vs. "brake", "stake" vs. "steak", "loose" and "lose".
|
Those aren't generally problems in OCR'd ebook versions. Instead we get:
modern/modem switches (It was a shock to read about "modem birth control methods.")
burn/bum
corn/com
(etc.)
"die" instead of "the" (which is very hard to fix by search & replace, and spellcheck won't notice it... you spot it by looking for "diere.")
"hi" instead of "In."
Periods instead of commas in front of capitalized words or acronyms, because the OCR program is just smart enough to think a capitalized word is probably the start of a sentence.
Semicolons that should be commas but there was a dust speck on the page, or the comma was in an odd font that the program read as having two parts.
Hyphenations in mid-line. Hyphens instead of n-dashes or m-dashes.