Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-03-2011, 08:21 AM   #1
erion
Member
erion began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Aug 2010
Device: none
Epub to Epub: issue with Multiple headings and page breaks

Hello all,
I'm trying to restructure an Epub file, splitting chapters into separate html files.
The book has a proper ToC, so there is no problem on that side. However when it comes to splitting the files, the following happens:
The book has page breaks in <div> tags, such as
Code:
<div class="mbppagebreak"></div>
where Calibre splits accordingly. However, when a <h2> tag is followed by another <h2> tag, there is a file split between the two.
Using the command line converter (ebook-convert), I've tried turning heuristics on (--enable-heuristics), resulting in the whole chapter wrapped in <h1> tags. I also tried specifying '--chapter-mark none', which should not apply page breaks before detected chapters (i.e. h2 tags).
Then came the XPath way, setting it to heading 6 which is not present in the document: '--page-breaks-before //h:h6'.
Sadly none of these seem to help. The chapter title is either in the previous html file, then in the next the second h2 tag with the actual text, or the first h2 tag is in a separate file and in the next file there's the second h2 tag with the chapter text.
It's also worth mentioning that sometimes there is only one h2 tag, sometimes there is two, when the chapter has a date.
Could anyone please point out what could be wrong here?
Edit: h2 tag meaning <h2> and a closing </h2>

Erion

Last edited by erion; 12-03-2011 at 08:23 AM.
erion is offline   Reply With Quote
Old 12-03-2011, 09:15 AM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,619
Karma: 5628865
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by erion View Post
Hello all,
I'm trying to restructure an Epub file, splitting chapters into separate html files.
The book has a proper ToC, so there is no problem on that side. However when it comes to splitting the files, the following happens:
The book has page breaks in <div> tags, such as
Code:
<div class="mbppagebreak"></div>
where Calibre splits accordingly. However, when a <h2> tag is followed by another <h2> tag, there is a file split between the two.
Using the command line converter (ebook-convert), I've tried turning heuristics on (--enable-heuristics), resulting in the whole chapter wrapped in <h1> tags. I also tried specifying '--chapter-mark none', which should not apply page breaks before detected chapters (i.e. h2 tags).
Then came the XPath way, setting it to heading 6 which is not present in the document: '--page-breaks-before //h:h6'.
Sadly none of these seem to help. The chapter title is either in the previous html file, then in the next the second h2 tag with the actual text, or the first h2 tag is in a separate file and in the next file there's the second h2 tag with the chapter text.
It's also worth mentioning that sometimes there is only one h2 tag, sometimes there is two, when the chapter has a date.
Could anyone please point out what could be wrong here?
Edit: h2 tag meaning <h2> and a closing </h2>

Erion
Nothing wrong. You just found a condition that needs human help.

There is a Convert ((Heuristics) preference that re-assigns (downwards) a second Consecutive H tag.
theducks is offline   Reply With Quote
Old 12-03-2011, 10:25 AM   #3
erion
Member
erion began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Aug 2010
Device: none
Hi,

Quote:
Originally Posted by theducks View Post
Nothing wrong. You just found a condition that needs human help.

There is a Convert ((Heuristics) preference that re-assigns (downwards) a second Consecutive H tag.
The problem is the renumbering does not actually happen. Even if I manually modify the second h2 tag of the first chapter (with the date), the chapter title with a <h2> tag will be in a separate file (probably because of the page break right before it), and the date with a <h3> tag will be in the next file with the chapter's text itself.
The original .epub's first html file ends with the chapter title of the first chapter, and in the next file is the date and the text.
This is just a wild guess, but could it be that for some odd reason Calibre honours the original file splits as well?

Erion
erion is offline   Reply With Quote
Old 12-03-2011, 12:52 PM   #4
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Use Sigil to do this kind of work, you just insert "SGF Chapter Markers" (hr tags with a specific class), where needed. You then use the "split on chapter markers" function.

So you can try automate it to only insert at the first h tags, or else it's usually quick enough to just do by hand.
Serpentine is offline   Reply With Quote
Old 12-03-2011, 06:21 PM   #5
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,619
Karma: 5628865
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by erion View Post
Hi,



The problem is the renumbering does not actually happen. Even if I manually modify the second h2 tag of the first chapter (with the date), the chapter title with a <h2> tag will be in a separate file (probably because of the page break right before it), and the date with a <h3> tag will be in the next file with the chapter's text itself.
The original .epub's first html file ends with the chapter title of the first chapter, and in the next file is the date and the text.
This is just a wild guess, but could it be that for some odd reason Calibre honours the original file splits as well?

Erion
If it still breaks on H3 and H2 as well, look and see if you changed the chapter detection (normally H1 and H2) to include h3.

look at the stylesheet and see if a page-break-before is assigned to the class of those headers.
theducks is offline   Reply With Quote
Old 12-04-2011, 06:06 AM   #6
erion
Member
erion began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Aug 2010
Device: none
Quote:
Originally Posted by theducks View Post
If it still breaks on H3 and H2 as well, look and see if you changed the chapter detection (normally H1 and H2) to include h3.

look at the stylesheet and see if a page-break-before is assigned to the class of those headers.
Thanks, this seems to be the problem. The second h2, i.e. the date has a page-break-before: always assigned to it. It does not make any sense to break there, but that's a different story.
Huge thanks to everyone who replied or even thought about it!

Erion
erion is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Page Breaks for Epub with InDesign markdraper ePub 36 10-29-2011 08:54 PM
epub to mobi h1 page breaks not starting on new page wannabee Conversion 4 08-02-2011 12:46 AM
epub to mobi loses page breaks stevent10993 Conversion 5 07-20-2011 12:43 AM
PDF to EPUB - page breaks pops1959 Calibre 0 01-13-2011 07:28 PM
Any way to force page breaks when converting HTML to EPUB Bierkonig Calibre 23 10-31-2009 01:51 PM


All times are GMT -4. The time now is 03:50 AM.


MobileRead.com is a privately owned, operated and funded community.