View Single Post
Old 04-13-2020, 08:06 AM   #1
kapono
Junior Member
kapono began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Apr 2020
Device: Kobo Clara HD
Footnotes being detected as chapters

Hi everyone,

I'm converting an html page to epub that contains the title of the document inside an h1 on the html header and the chapter titles inside <h2> tags, but there is not attribute that contains the words ('chapter', 'section' etc.). So I changed the XPath expression that detects chapters into this
Code:
//*[((name()='h1' or name()='h2') and re:test(., '[a-z]+', 'i'))]
The regex is to avoid headers that are empty, there are a fer ones.

This rule worked well but It added the title of the book to the table of contents, resulting in a section without content. To avoid that I removed the 'h1' from the expression.
Code:
//*[(name()='h2' and re:test(., '[a-z]+', 'i'))]
But after that change, calibre is detecting all the footnotes that are at the end of the document in a <footer> as sections. I examined the html and there are no h1 nor h2 tags on the footer. Am I missing something?

The structure of the HTML is something like this:
Code:
<section>
  <h2 style="text-align: justify;"><a name="cap1"><strong>Lorem Ipsum</strong></a></h2>
  <p style="text-align: justify;">Mauris egestas vestibulum eros convallis sodales. Curabitur semper sapien quis tellus tempor ultrices. Donec sagittis pellentesque metus, in tempus velit. Suspendisse consectetur pretium erat vel consequat. </p>
  ....
  <footer>
    <font size="-1">
       <p style="text-align: justify;"><a href="#_ftnref1" name="_ftn1">[1]</a> Maecenas eu scelerisque justo, sed tristique dolor. </p>
       ...
    </font>
  </footer>
</section>
P.S.: I also tried withh this, but it keeps detecting the footnotes as sections in the Table of Contents
Code:
//h:section/h:h2
kapono is offline   Reply With Quote