Just a quick update: ebook-convert's HTMLZ output format is not suitable at all because it renames the images, i.e., starting with 00000, and incrementing by 1 for each successive image. Pandoc is way better for this use case, but now I am running into the fact that the KF8 text refers to one more image than the KFX, so will have to take a look at the surrounding text to see why that is, and whether it will be trivial to work around or not, e.g., I can just discard the first or last image and have it matching in all other respects. Regular expressions and Notepad++ functions are really helping here, but it is definitely not easily automatable for sure.
Should I just leave this topic be, i.e., no further reports, as having the best of both worlds seems to be a specific use case that no one else really needs? In other words, this information would only be useful for someone like me who wants to have the highest resolution for images where possible but also keeping the semantic information. Everyone else is probably just satisfied with the KFX output, since most users are likely to run it through Calibre anyways, which bloats the code a bit and definitely does not leave it untouched no matter what arguments are used. Even if I were to be successful, it's not like it'll help anyone else out...
|