MobileRead Forums - View Single Post

kapono · 04-13-2020, 08:06 AM

Hi everyone,

I'm converting an html page to epub that contains the title of the document inside an h1 on the html header and the chapter titles inside <h2> tags, but there is not attribute that contains the words ('chapter', 'section' etc.). So I changed the XPath expression that detects chapters into this

Code:

//*[((name()='h1' or name()='h2') and re:test(., '[a-z]+', 'i'))]

The regex is to avoid headers that are empty, there are a fer ones.

This rule worked well but It added the title of the book to the table of contents, resulting in a section without content. To avoid that I removed the 'h1' from the expression.

Code:

//*[(name()='h2' and re:test(., '[a-z]+', 'i'))]

But after that change, calibre is detecting all the footnotes that are at the end of the document in a <footer> as sections. I examined the html and there are no h1 nor h2 tags on the footer. Am I missing something?

The structure of the HTML is something like this:

Code:

<section>
  <h2 style="text-align: justify;"><a name="cap1"><strong>Lorem Ipsum</strong></a></h2>
  <p style="text-align: justify;">Mauris egestas vestibulum eros convallis sodales. Curabitur semper sapien quis tellus tempor ultrices. Donec sagittis pellentesque metus, in tempus velit. Suspendisse consectetur pretium erat vel consequat. </p>
  ....
  <footer>
    <font size="-1">
       <p style="text-align: justify;"><a href="#_ftnref1" name="_ftn1">[1]</a> Maecenas eu scelerisque justo, sed tristique dolor. </p>
       ...
    </font>
  </footer>
</section>

P.S.: I also tried withh this, but it keeps detecting the footnotes as sections in the Table of Contents

Code:

//h:section/h:h2

04-13-2020, 08:06 AM	#1
kapono Junior Member Posts: 1 Karma: 10 Join Date: Apr 2020 Device: Kobo Clara HD	Footnotes being detected as chapters Hi everyone, I'm converting an html page to epub that contains the title of the document inside an h1 on the html header and the chapter titles inside <h2> tags, but there is not attribute that contains the words ('chapter', 'section' etc.). So I changed the XPath expression that detects chapters into this Code: //[((name()='h1' or name()='h2') and re:test(., '[a-z]+', 'i'))] The regex is to avoid headers that are empty, there are a fer ones. This rule worked well but It added the title of the book to the table of contents, resulting in a section without content. To avoid that I removed the 'h1' from the expression. Code: //[(name()='h2' and re:test(., '[a-z]+', 'i'))] But after that change, calibre is detecting all the footnotes that are at the end of the document in a <footer> as sections. I examined the html and there are no h1 nor h2 tags on the footer. Am I missing something? The structure of the HTML is something like this: Code: <section> <h2 style="text-align: justify;"><a name="cap1"><strong>Lorem Ipsum</strong></a></h2> <p style="text-align: justify;">Mauris egestas vestibulum eros convallis sodales. Curabitur semper sapien quis tellus tempor ultrices. Donec sagittis pellentesque metus, in tempus velit. Suspendisse consectetur pretium erat vel consequat. </p> .... <footer> <font size="-1"> <p style="text-align: justify;"><a href="#_ftnref1" name="_ftn1">[1]</a> Maecenas eu scelerisque justo, sed tristique dolor. </p> ... </font> </footer> </section> P.S.: I also tried withh this, but it keeps detecting the footnotes as sections in the Table of Contents Code: //h:section/h:h2