View Single Post
Old 04-24-2015, 11:14 AM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,227
Karma: 60807154
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by cybmole View Post
the rationale for scrapping such sources is that they often have other hard-to-fix problems: e.g paragraph breaks that occur mid sentence, missing or incomplete TOC, messed up punctuation, annoying OCR errors...

I treat the presence of hard coded "page numbers" as a warning sign:
"beware: crap conversion ahead"
Hard page numbers ALSO come from PDF conversions (just one of many issues, most can be fixed with a bunch of REGEX, but NOT during conversion). OCR errors are always going to take TLC proofing to remove
theducks is offline   Reply With Quote