Quote:
Originally Posted by eschwartz
Normally the regex is applied to the raw contents of the input format, i.e. unzipped EPUB/AZW3 (X)HTML. But PDF is, ah, complicated, so it has to be turned into HTML before you can convert that HTML to something else.
|
Thanks for these clarifications. In the end I gave up and cut and pasted the html into Sublime Text and RegexRX where I made a more open regex and managed to get rid of all footers. Some of the issues stemmed from the fact that Calibre used different HTML classes for the same footer.
This was quite hard to discover in Calibre. Hopefully I learned something for my next title,