MobileRead Forums - View Single Post

theducks · 04-24-2015, 12:14 PM

Quote:

Originally Posted by cybmole

the rationale for scrapping such sources is that they often have other hard-to-fix problems: e.g paragraph breaks that occur mid sentence, missing or incomplete TOC, messed up punctuation, annoying OCR errors...

I treat the presence of hard coded "page numbers" as a warning sign:
"beware: crap conversion ahead"

Hard page numbers ALSO come from PDF conversions (just one of many issues, most can be fixed with a bunch of REGEX, but NOT during conversion). OCR errors are always going to take TLC proofing to remove