set pagebreaks at TOC entires

cybmole · 01-14-2011, 03:32 PM

i like seeing chapters begin at a pagrbreak on KIndle, but it does not always happen even when I have a working chapter-by-chapter TOC.

in the case where it does not happen, I guess it is because there are no actual header or chapter tags in the epub which I am converting to mobi, but there is a valid & working TOC.

so is there something I can tweak during the epub to mobi convert that will introduce a new page at each toc entry point ?

or, to achieve it by another route, can I forcee calibre to recreate the epub but to split its files at the TOC points, then convert that back to mobi - that would also work ?

I can do it manually by splitting the epub in sigil at each chapter point , and then converting to mobi. the conversion adds a pagebreak at each change of file - but that is tedious of there are lots of chapters.

janvanmaar · 01-14-2011, 04:06 PM

There might be a simpler way but I would do it by first doing a "dummy" conversion with debug on, have a look how the chapters are denoted in the html input in the debug directory and then supply that to the "Structure detection", "Insert page breaks before"

cybmole · 01-15-2011, 02:42 AM

Quote:

Originally Posted by janvanmaar

There might be a simpler way but I would do it by first doing a "dummy" conversion with debug on, have a look how the chapters are denoted in the html input in the debug directory and then supply that to the "Structure detection", "Insert page breaks before"

varies form book to book - I do that anyway if I fix them up in sigil. t
tpically the chapter labels are on stand alone lines but in in simple <p tags just like the rest of the book & have to be spotted by their syntax - which could be CHAPTER ONE, CHAPTER 1, 1, I ( roman numerals ) I can usually constuct a sigil find - replace regex to pick them out but can't get them via calibre structure detection due to lack of unique tags.

ldolse · 01-15-2011, 03:06 AM

I assume you've already tried enabling preprocess under structure detection. If you haven't that's one of the things it's designed to do. If you have books that aren't working then open a bug at bugs.calibre-ebook.com and attach some examples.

cybmole · 01-15-2011, 03:31 AM

we've been around that loop in other threads, structure detection doesn't stand a chance if the chapter starts have the same html tags as the rest of the book. I can pick them out manually by reasoning that, say, a line with only a roman numeral is a chapter start, or a line that begins with CHAPTER in caps, but I can't add those constructs to structure detect easily.

we could test: I should be able to find a non-patched up book in my collection, which has chapters of the form CHAPTER... , inside of simple <p tags.
give me the xpath expression for structure detect of that on epub source, please.
when I try making them with the wizard & tweaking them, they either find far too much, or multiple entire nothing at all. I already figured out that the "i" bit sets case insensitive , so I remove that. and I add class = bold if that's the chapter header style, but still no joy.

ldolse · 01-15-2011, 06:19 AM

We've been around that loop before, and I asked for a sample that doesn't work, and you declined.

I have lots of books formatted exactly as you describe, and preprocess works just fine for me. One of the reasons I implemented the feature is to solve exactly the problem you're describing. That said, I'm sure there are books where it doesn't work today. I'm happy to enhance the function when possible, but without examples of it not working there isn't much I can do.

cybmole · 01-15-2011, 07:37 AM

i will go find a couple now, can you pm me with an email to sent them to as I cant attach books here.

cybmole · 01-15-2011, 07:48 AM

here's one - html source -added to calibre whcih converts to zip - tried convert to epub with relevant boxes ticked - no chapters found. the cahpters are stand alone digits.

tell me once you have it & I will remove it

Attachment deleted by moderator

Moderator Notice
IF YOU COPY COPYRIGHTED MATERIAL AGAIN TO MOBILEREAD YOU WILL BE BANNED. WE WILL NOT TOLERATE THIS UNDER ANY CIRCUMSTANCES. THIS IS YOUR ONE AND ONLY WARNING ON THE SUBJECT!

ldolse · 01-15-2011, 07:56 AM

I think I've explained to you before - you can't put copyrighted content on mobileread, it's against the forum rules. That's why I asked you to open a bug at bugs.calibre-ebook.com and attach the files there.

Please remove the attachment. I'll look at the book and let you know what's going on shortly.

cybmole · 01-15-2011, 07:59 AM

here's a more typically frustrating one - lit source - I do both convert to epub & convert to mobi - I try with and without tickign use auto generated TOC. no TOC appears either way for either output when viewed with calibre reader.. but when I inspect epub code I see h2 tags around chapter numbers !

Code:

<h2 class="calibre2" id="calibre_pb_18">72</h2>

attaching a zip of the source lit - can you get it to convert and to detect & show chapters ?

update - forget that one - it's am embarrasingly bad source, not that I look at it, lots of orphaned page numbers

i'll look for an alternative.

ldolse · 01-15-2011, 08:14 AM

The preprocess option found the chapters just fine. Just to be clear, when I tell you to enable preprocess, you need to check the box next to 'preprocess input file to possibly improve structure detection'.

All the numeric chapter headings were wrapped in <h2> tags by the preprocess function (except the first one which use the letter 'I' instead of a '1').

With the default settings your file WILL be split into a single file per chapter - this requires that you haven't changed the global defaults.

The default setting for 'insert page breaks before' should be:

Code:

//*[name()='h1' or name()='h2']

This is the default setting - if you see something different it means you must have changed it in the global preferences.

If you've gone and changed that then you won't get the document split into multiple flows at the correct points.

TOC creation will not happen automatically (with the default settings) with these types of chapter headings, but that's trivial to fix. All of these chapter headings are numeric - that can be represented using regular expressions with \d+

So change the default TOC xpath from this:

Code:

//*[((name()='div' or name()='h2') and re:test(., 'chapter|book|section|part\s+', 'i')) or @class = 'chapter']

To this:

Code:

//*[((name()='h1' or name()='h2') and re:test(., '\d+', 'i')) or @class = 'chapter']

If you can't figure out how to fix the xpath, preprocessing did work anyway. All you would need to do is open this document up in Sigil and then save it, and you would automatically get a TOC, since Sigil autogenerates a TOC based on heading tags, and preprocess successfully inserted them.

ldolse · 01-15-2011, 08:17 AM

.....

cybmole · 01-15-2011, 08:38 AM

right - lets diagonse thie before I get lynched by the mods :-)

I have the same zip as you. I tick "restore defaults" on all the conversion screens & convert to epub, then view that in calibre viewer - the only TOC entry is "start"

repeat - but tick auto generate toc on toc screen... same result

open epub with sigil and I can see that structure detection has done its thing and has added the h2 tags ,& has created 1 file per chapter -&, I am not seeing the benefit in the form of a viewable TOC -

so we are both apparently right - your structure detection code IS working but I am not getting a TOC to view ...

AHA

I think you explained that actually, 2 posts up.... I need to open & close with sigil to compete the TOC generation, & that works & is fine for me. others may want a calibre-only solution and that would be to modify the xpath ?

I was maybe close to the answer but still in the dark !

so model answer.
step 1:enable structure detect - no need to enable auto generate toc. convert to epub.
at this stage the fchapters have been tagged & the fepub has been split with 1 file per chapter
step 2 open & close epub in sigil.

one final query on xpah - I come unstuck if I try to modify the end bit ( or class = chapter bit) in order to add an AND - is that because it is no possible.

say I wanted to replace the @class = 'chapter' test with a test for bold roman numerals so I'd want somewthing like ( @class = 'bold' AND re:test( [XIV]+ ). I'm not too clear on the retest syntax but have tried lots of invalid permutations!

is that doable and could you give a model answer for that please.

ldolse · 01-15-2011, 08:44 AM

The default settings will only split on H1 and H2 tags. It won't auto-magically put the chapters in a TOC. For that you need the right xpath.

That's why I told you to change the 'Detect Chapters at(Xpath expression).

Change it from the default to this:

Code:

//*[((name()='h1' or name()='h2') and re:test(., '\d+', 'i')) or @class = 'chapter']

And if you can't figure out Xpath for other files just open the file in Sigil and re-save it after using the pre-process option, you'll automatically get a TOC.

In the future use Calibre's bug reporting system if you have a file that you think doesn't work.

cybmole · 01-15-2011, 09:10 AM

thank you for you patience. it IS getting clearer gradually.

to sum up so far.

begin by converting source, in whatever format, to epub.

special case - if source is already EPUB but lacks header tags, then make a copy of it, rename that copy to .zip, add back to calibre as .zip, discard the bad epub, continue as per method for non-epub sources - below.

the purpose of structure detect ( amongst other things) is to identify and tag chapter starts. but it does not , by itself, create a TOC.

the xpath line defines how to identify what elements to place in a toc

if I cant customise the xpath successfully then I should use sigil - open - ( patch with regex if needed), save.

the auto-generate toc option , on TOC screen, is not relevant at this ( create a good epub) stage, & should not be needed thereafter when I convert the epub to mobi for its final destination on Kindle

yes - I'll go learn how to use the bug reports.

01-14-2011, 03:32 PM	#1
cybmole Wizard Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none	set pagebreaks at TOC entires i like seeing chapters begin at a pagrbreak on KIndle, but it does not always happen even when I have a working chapter-by-chapter TOC. in the case where it does not happen, I guess it is because there are no actual header or chapter tags in the epub which I am converting to mobi, but there is a valid & working TOC. so is there something I can tweak during the epub to mobi convert that will introduce a new page at each toc entry point ? or, to achieve it by another route, can I forcee calibre to recreate the epub but to split its files at the TOC points, then convert that back to mobi - that would also work ? I can do it manually by splitting the epub in sigil at each chapter point , and then converting to mobi. the conversion adds a pagebreak at each change of file - but that is tedious of there are lots of chapters.

01-15-2011, 03:31 AM	#5
cybmole Wizard Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none	we've been around that loop in other threads, structure detection doesn't stand a chance if the chapter starts have the same html tags as the rest of the book. I can pick them out manually by reasoning that, say, a line with only a roman numeral is a chapter start, or a line that begins with CHAPTER in caps, but I can't add those constructs to structure detect easily. we could test: I should be able to find a non-patched up book in my collection, which has chapters of the form CHAPTER... , inside of simple <p tags. give me the xpath expression for structure detect of that on epub source, please. when I try making them with the wizard & tweaking them, they either find far too much, or multiple entire nothing at all. I already figured out that the "i" bit sets case insensitive , so I remove that. and I add class = bold if that's the chapter header style, but still no joy. Last edited by cybmole; 01-15-2011 at 03:37 AM.

01-15-2011, 07:48 AM	#8
cybmole Wizard Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none	here's one - html source -added to calibre whcih converts to zip - tried convert to epub with relevant boxes ticked - no chapters found. the cahpters are stand alone digits. tell me once you have it & I will remove it Attachment deleted by moderator Moderator Notice IF YOU COPY COPYRIGHTED MATERIAL AGAIN TO MOBILEREAD YOU WILL BE BANNED. WE WILL NOT TOLERATE THIS UNDER ANY CIRCUMSTANCES. THIS IS YOUR ONE AND ONLY WARNING ON THE SUBJECT! Last edited by HarryT; 01-15-2011 at 08:01 AM.

01-15-2011, 07:59 AM	#10
cybmole Wizard Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none	here's a more typically frustrating one - lit source - I do both convert to epub & convert to mobi - I try with and without tickign use auto generated TOC. no TOC appears either way for either output when viewed with calibre reader.. but when I inspect epub code I see h2 tags around chapter numbers ! Code: <h2 class="calibre2" id="calibre_pb_18">72</h2> attaching a zip of the source lit - can you get it to convert and to detect & show chapters ? update - forget that one - it's am embarrasingly bad source, not that I look at it, lots of orphaned page numbers i'll look for an alternative. Last edited by cybmole; 01-15-2011 at 08:11 AM.

01-15-2011, 08:14 AM	#11
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	The preprocess option found the chapters just fine. Just to be clear, when I tell you to enable preprocess, you need to check the box next to 'preprocess input file to possibly improve structure detection'. All the numeric chapter headings were wrapped in <h2> tags by the preprocess function (except the first one which use the letter 'I' instead of a '1'). With the default settings your file WILL be split into a single file per chapter - this requires that you haven't changed the global defaults. The default setting for 'insert page breaks before' should be: Code: //[name()='h1' or name()='h2'] This is the default setting - if you see something different it means you must have changed it in the global preferences. If you've gone and changed that then you won't get the document split into multiple flows at the correct points. TOC creation will not happen automatically (with the default settings) with these types of chapter headings, but that's trivial to fix. All of these chapter headings are numeric - that can be represented using regular expressions with \d+ So change the default TOC xpath from this: Code: //[((name()='div' or name()='h2') and re:test(., 'chapter\|book\|section\|part\s+', 'i')) or @class = 'chapter'] To this: Code: //[((name()='h1' or name()='h2') and re:test(., '\d+', 'i')) or @class = 'chapter'] If you can't figure out how to fix the xpath, preprocessing did work anyway. All you would need to do is open this document up in Sigil and then save it, and you would automatically get a TOC, since Sigil autogenerates a TOC based on heading tags, and preprocess successfully inserted them. Last edited by ldolse; 01-15-2011 at 08:18 AM.*

01-14-2011, 04:06 PM	#2
janvanmaar Addict Posts: 219 Karma: 404 Join Date: Nov 2010 Device: Kindle 3G, Samsung SIII	There might be a simpler way but I would do it by first doing a "dummy" conversion with debug on, have a look how the chapters are denoted in the html input in the debug directory and then supply that to the "Structure detection", "Insert page breaks before"

01-15-2011, 03:06 AM	#4
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	I assume you've already tried enabling preprocess under structure detection. If you haven't that's one of the things it's designed to do. If you have books that aren't working then open a bug at bugs.calibre-ebook.com and attach some examples.

01-15-2011, 06:19 AM	#6
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	We've been around that loop before, and I asked for a sample that doesn't work, and you declined. I have lots of books formatted exactly as you describe, and preprocess works just fine for me. One of the reasons I implemented the feature is to solve exactly the problem you're describing. That said, I'm sure there are books where it doesn't work today. I'm happy to enhance the function when possible, but without examples of it not working there isn't much I can do.

01-15-2011, 07:37 AM	#7
cybmole Wizard Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none	i will go find a couple now, can you pm me with an email to sent them to as I cant attach books here.

01-15-2011, 07:56 AM	#9
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	I think I've explained to you before - you can't put copyrighted content on mobileread, it's against the forum rules. That's why I asked you to open a bug at bugs.calibre-ebook.com and attach the files there. Please remove the attachment. I'll look at the book and let you know what's going on shortly.

01-15-2011, 08:17 AM	#12
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	..... Last edited by ldolse; 01-15-2011 at 08:22 AM.

01-15-2011, 08:38 AM	#13
cybmole Wizard Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none	right - lets diagonse thie before I get lynched by the mods :-) I have the same zip as you. I tick "restore defaults" on all the conversion screens & convert to epub, then view that in calibre viewer - the only TOC entry is "start" repeat - but tick auto generate toc on toc screen... same result open epub with sigil and I can see that structure detection has done its thing and has added the h2 tags ,& has created 1 file per chapter -&, I am not seeing the benefit in the form of a viewable TOC - so we are both apparently right - your structure detection code IS working but I am not getting a TOC to view ... AHA I think you explained that actually, 2 posts up.... I need to open & close with sigil to compete the TOC generation, & that works & is fine for me. others may want a calibre-only solution and that would be to modify the xpath ? I was maybe close to the answer but still in the dark ! so model answer. step 1:enable structure detect - no need to enable auto generate toc. convert to epub. at this stage the fchapters have been tagged & the fepub has been split with 1 file per chapter step 2 open & close epub in sigil. one final query on xpah - I come unstuck if I try to modify the end bit ( or class = chapter bit) in order to add an AND - is that because it is no possible. say I wanted to replace the @class = 'chapter' test with a test for bold roman numerals so I'd want somewthing like ( @class = 'bold' AND re:test( [XIV]+ ). I'm not too clear on the retest syntax but have tried lots of invalid permutations! is that doable and could you give a model answer for that please. Last edited by cybmole; 01-15-2011 at 08:55 AM.

01-15-2011, 08:44 AM	#14
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	The default settings will only split on H1 and H2 tags. It won't auto-magically put the chapters in a TOC. For that you need the right xpath. That's why I told you to change the 'Detect Chapters at(Xpath expression). Change it from the default to this: Code: //*[((name()='h1' or name()='h2') and re:test(., '\d+', 'i')) or @class = 'chapter'] And if you can't figure out Xpath for other files just open the file in Sigil and re-save it after using the pre-process option, you'll automatically get a TOC. In the future use Calibre's bug reporting system if you have a file that you think doesn't work.

01-15-2011, 09:10 AM	#15
cybmole Wizard Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none	thank you for you patience. it IS getting clearer gradually. to sum up so far. begin by converting source, in whatever format, to epub. special case - if source is already EPUB but lacks header tags, then make a copy of it, rename that copy to .zip, add back to calibre as .zip, discard the bad epub, continue as per method for non-epub sources - below. the purpose of structure detect ( amongst other things) is to identify and tag chapter starts. but it does not , by itself, create a TOC. the xpath line defines how to identify what elements to place in a toc if I cant customise the xpath successfully then I should use sigil - open - ( patch with regex if needed), save. the auto-generate toc option , on TOC screen, is not relevant at this ( create a good epub) stage, & should not be needed thereafter when I convert the epub to mobi for its final destination on Kindle yes - I'll go learn how to use the bug reports.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Losing Chapter PageBreaks in .MOBI on Kindle	XRaySpeX	Calibre	7	12-10-2010 05:29 AM
Unwanted Pagebreaks	Timoleon	Calibre	3	09-19-2010 07:53 PM
Pagebreaks for RTF to mobi question	mputtr	Calibre	1	03-17-2010 04:20 AM
Chapters showing unwanted pagebreaks and < h1 > text	raltman	Calibre	2	10-05-2009 04:50 PM
Epub and pagebreaks	mtravellerh	Calibre	3	11-02-2008 05:30 PM

Advert

Advert