Well yes, in that case, you're (@shotsky) doing MUCH more than editing an ebook. Let's face it: combining everything into a single html file (when it wasn't in a single html file to begin with) is going to be a fairly involved process no matter how you go about it. IDs that used to be unique, might not be unique any more; all links will need to be adjusted--IDs may need to be created to accommodate url fragments in those adjusted links; styles/css will need to be combined and/or edited. There's a lot to do for an automated tool to keep stuff valid (even if you don't need it all to be technically valid).
Quote:
The problem with using Calibre to make an htmlz file, which DOES put all the files into a single html file, is that it renames classes and elements in an unexpected manner, which causes the original meaning of the text that followes to be lost.
|
That's just it, though. It's not in an "unexpected manner" if you think about it. All those things I mentioned above are WHY those changes have to happen--so that it will "work" for every case thrown at it. Also keep in mind that you're still talking about a "conversion" not an edit.
I'm afraid that in your case, manually merging files is probably always going to be the "best" approach (even if it is the most tedious). As soon as you look to automate portions of your process, you're going to have to sacrifice control of the output. That part never changes.
If your ultimate goal is to extract all the text, I would seriously suggest finding a way to do so that doesn't involve combining all html files into one file as the very first step.