the rationale for scrapping such sources is that they often have other hard-to-fix problems: e.g paragraph breaks that occur mid sentence, missing or incomplete TOC, messed up punctuation, annoying OCR errors...
I treat the presence of hard coded "page numbers" as a warning sign:
"beware: crap conversion ahead"