Quote:
Originally Posted by erion
Hello all,
I'm trying to restructure an Epub file, splitting chapters into separate html files.
The book has a proper ToC, so there is no problem on that side. However when it comes to splitting the files, the following happens:
The book has page breaks in <div> tags, such as
Code:
<div class="mbppagebreak"></div>
where Calibre splits accordingly. However, when a <h2> tag is followed by another <h2> tag, there is a file split between the two.
Using the command line converter (ebook-convert), I've tried turning heuristics on (--enable-heuristics), resulting in the whole chapter wrapped in <h1> tags. I also tried specifying '--chapter-mark none', which should not apply page breaks before detected chapters (i.e. h2 tags).
Then came the XPath way, setting it to heading 6 which is not present in the document: '--page-breaks-before //h:h6'.
Sadly none of these seem to help. The chapter title is either in the previous html file, then in the next the second h2 tag with the actual text, or the first h2 tag is in a separate file and in the next file there's the second h2 tag with the chapter text.
It's also worth mentioning that sometimes there is only one h2 tag, sometimes there is two, when the chapter has a date.
Could anyone please point out what could be wrong here?
Edit: h2 tag meaning <h2> and a closing </h2>
Erion
|
Nothing wrong. You just found a condition that needs human help.
There is a Convert ((Heuristics) preference that re-assigns (downwards) a
second Consecutive H tag.