View Single Post
Old 12-03-2011, 09:15 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,099
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by erion View Post
Hello all,
I'm trying to restructure an Epub file, splitting chapters into separate html files.
The book has a proper ToC, so there is no problem on that side. However when it comes to splitting the files, the following happens:
The book has page breaks in <div> tags, such as
Code:
<div class="mbppagebreak"></div>
where Calibre splits accordingly. However, when a <h2> tag is followed by another <h2> tag, there is a file split between the two.
Using the command line converter (ebook-convert), I've tried turning heuristics on (--enable-heuristics), resulting in the whole chapter wrapped in <h1> tags. I also tried specifying '--chapter-mark none', which should not apply page breaks before detected chapters (i.e. h2 tags).
Then came the XPath way, setting it to heading 6 which is not present in the document: '--page-breaks-before //h:h6'.
Sadly none of these seem to help. The chapter title is either in the previous html file, then in the next the second h2 tag with the actual text, or the first h2 tag is in a separate file and in the next file there's the second h2 tag with the chapter text.
It's also worth mentioning that sometimes there is only one h2 tag, sometimes there is two, when the chapter has a date.
Could anyone please point out what could be wrong here?
Edit: h2 tag meaning <h2> and a closing </h2>

Erion
Nothing wrong. You just found a condition that needs human help.

There is a Convert ((Heuristics) preference that re-assigns (downwards) a second Consecutive H tag.
theducks is offline   Reply With Quote