Hi everyone,
I'm converting an html page to epub that contains the title of the document inside an h1 on the html header and the chapter titles inside <h2> tags, but there is not attribute that contains the words ('chapter', 'section' etc.). So I changed the XPath expression that detects chapters into this
Code:
//*[((name()='h1' or name()='h2') and re:test(., '[a-z]+', 'i'))]
The regex is to avoid headers that are empty, there are a fer ones.
This rule worked well but It added the title of the book to the table of contents, resulting in a section without content. To avoid that I removed the 'h1' from the expression.
Code:
//*[(name()='h2' and re:test(., '[a-z]+', 'i'))]
But after that change, calibre is detecting all the footnotes that are at the end of the document in a <footer> as sections. I examined the html and there are no h1 nor h2 tags on the footer. Am I missing something?
The structure of the HTML is something like this:
Code:
<section>
<h2 style="text-align: justify;"><a name="cap1"><strong>Lorem Ipsum</strong></a></h2>
<p style="text-align: justify;">Mauris egestas vestibulum eros convallis sodales. Curabitur semper sapien quis tellus tempor ultrices. Donec sagittis pellentesque metus, in tempus velit. Suspendisse consectetur pretium erat vel consequat. </p>
....
<footer>
<font size="-1">
<p style="text-align: justify;"><a href="#_ftnref1" name="_ftn1">[1]</a> Maecenas eu scelerisque justo, sed tristique dolor. </p>
...
</font>
</footer>
</section>
P.S.: I also tried withh this, but it keeps detecting the footnotes as sections in the Table of Contents