converting htm(l) format books for kindle
I found a collection of old sci-fi in .htm and .html formats. Calibre can convert them but where they have fixed carriage returns embedded, i.e. are set to look good only on a particular line width, then they still look bad in kindle, as they only partially reflow.
sending them to Amazon to convert does not work - you end up with html source code on your reader.
I thought I'd found a workaround - open in MS Word, apply auto-format, save as .rtf, then have calibre re convert them. that's better ,but word's auto format picks out odd bits of text and enlarges them into headers, somewhat arbitrarily, and it's not practical to manually check all of a 1000 page book in word, especially if you don't want to see any plot spoilers.
so I am wondering if there's a better way to get such books to fully re-flow, or if a future calibre release could have a more intelligent convert routine that strips out single carriage returns and only leaves in double or triple ones which are likely to be true paragraph ends ?
the other annoyances that are sometimes present in old conversions are repetitiveheaders/footers with e.g. a full file path name on every original "page". or in one case a spam " converted by program x " message & URL link. any way to automate removing such things ?