Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-14-2011, 03:32 PM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
set pagebreaks at TOC entires

i like seeing chapters begin at a pagrbreak on KIndle, but it does not always happen even when I have a working chapter-by-chapter TOC.

in the case where it does not happen, I guess it is because there are no actual header or chapter tags in the epub which I am converting to mobi, but there is a valid & working TOC.

so is there something I can tweak during the epub to mobi convert that will introduce a new page at each toc entry point ?

or, to achieve it by another route, can I forcee calibre to recreate the epub but to split its files at the TOC points, then convert that back to mobi - that would also work ?

I can do it manually by splitting the epub in sigil at each chapter point , and then converting to mobi. the conversion adds a pagebreak at each change of file - but that is tedious of there are lots of chapters.
cybmole is offline   Reply With Quote
Old 01-14-2011, 04:06 PM   #2
janvanmaar
Addict
janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.
 
Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
There might be a simpler way but I would do it by first doing a "dummy" conversion with debug on, have a look how the chapters are denoted in the html input in the debug directory and then supply that to the "Structure detection", "Insert page breaks before"
janvanmaar is offline   Reply With Quote
Advert
Old 01-15-2011, 02:42 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by janvanmaar View Post
There might be a simpler way but I would do it by first doing a "dummy" conversion with debug on, have a look how the chapters are denoted in the html input in the debug directory and then supply that to the "Structure detection", "Insert page breaks before"
varies form book to book - I do that anyway if I fix them up in sigil. t
tpically the chapter labels are on stand alone lines but in in simple <p tags just like the rest of the book & have to be spotted by their syntax - which could be CHAPTER ONE, CHAPTER 1, 1, I ( roman numerals ) I can usually constuct a sigil find - replace regex to pick them out but can't get them via calibre structure detection due to lack of unique tags.
cybmole is offline   Reply With Quote
Old 01-15-2011, 03:06 AM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I assume you've already tried enabling preprocess under structure detection. If you haven't that's one of the things it's designed to do. If you have books that aren't working then open a bug at bugs.calibre-ebook.com and attach some examples.
ldolse is offline   Reply With Quote
Old 01-15-2011, 03:31 AM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
we've been around that loop in other threads, structure detection doesn't stand a chance if the chapter starts have the same html tags as the rest of the book. I can pick them out manually by reasoning that, say, a line with only a roman numeral is a chapter start, or a line that begins with CHAPTER in caps, but I can't add those constructs to structure detect easily.

we could test: I should be able to find a non-patched up book in my collection, which has chapters of the form CHAPTER... , inside of simple <p tags.
give me the xpath expression for structure detect of that on epub source, please.
when I try making them with the wizard & tweaking them, they either find far too much, or multiple entire nothing at all. I already figured out that the "i" bit sets case insensitive , so I remove that. and I add class = bold if that's the chapter header style, but still no joy.

Last edited by cybmole; 01-15-2011 at 03:37 AM.
cybmole is offline   Reply With Quote
Advert
Old 01-15-2011, 06:19 AM   #6
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
We've been around that loop before, and I asked for a sample that doesn't work, and you declined.

I have lots of books formatted exactly as you describe, and preprocess works just fine for me. One of the reasons I implemented the feature is to solve exactly the problem you're describing. That said, I'm sure there are books where it doesn't work today. I'm happy to enhance the function when possible, but without examples of it not working there isn't much I can do.
ldolse is offline   Reply With Quote
Old 01-15-2011, 07:37 AM   #7
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
i will go find a couple now, can you pm me with an email to sent them to as I cant attach books here.
cybmole is offline   Reply With Quote
Old 01-15-2011, 07:48 AM   #8
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
here's one - html source -added to calibre whcih converts to zip - tried convert to epub with relevant boxes ticked - no chapters found. the cahpters are stand alone digits.

tell me once you have it & I will remove it

Attachment deleted by moderator

Moderator Notice
IF YOU COPY COPYRIGHTED MATERIAL AGAIN TO MOBILEREAD YOU WILL BE BANNED. WE WILL NOT TOLERATE THIS UNDER ANY CIRCUMSTANCES. THIS IS YOUR ONE AND ONLY WARNING ON THE SUBJECT!

Last edited by HarryT; 01-15-2011 at 08:01 AM.
cybmole is offline   Reply With Quote
Old 01-15-2011, 07:56 AM   #9
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I think I've explained to you before - you can't put copyrighted content on mobileread, it's against the forum rules. That's why I asked you to open a bug at bugs.calibre-ebook.com and attach the files there.

Please remove the attachment. I'll look at the book and let you know what's going on shortly.
ldolse is offline   Reply With Quote
Old 01-15-2011, 07:59 AM   #10
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
here's a more typically frustrating one - lit source - I do both convert to epub & convert to mobi - I try with and without tickign use auto generated TOC. no TOC appears either way for either output when viewed with calibre reader.. but when I inspect epub code I see h2 tags around chapter numbers !
Code:
<h2 class="calibre2" id="calibre_pb_18">72</h2>
attaching a zip of the source lit - can you get it to convert and to detect & show chapters ?

update - forget that one - it's am embarrasingly bad source, not that I look at it, lots of orphaned page numbers

i'll look for an alternative.

Last edited by cybmole; 01-15-2011 at 08:11 AM.
cybmole is offline   Reply With Quote
Old 01-15-2011, 08:14 AM   #11
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
The preprocess option found the chapters just fine. Just to be clear, when I tell you to enable preprocess, you need to check the box next to 'preprocess input file to possibly improve structure detection'.

All the numeric chapter headings were wrapped in <h2> tags by the preprocess function (except the first one which use the letter 'I' instead of a '1').

With the default settings your file WILL be split into a single file per chapter - this requires that you haven't changed the global defaults.

The default setting for 'insert page breaks before' should be:
Code:
//*[name()='h1' or name()='h2']
This is the default setting - if you see something different it means you must have changed it in the global preferences.

If you've gone and changed that then you won't get the document split into multiple flows at the correct points.

TOC creation will not happen automatically (with the default settings) with these types of chapter headings, but that's trivial to fix. All of these chapter headings are numeric - that can be represented using regular expressions with \d+

So change the default TOC xpath from this:
Code:
//*[((name()='div' or name()='h2') and re:test(., 'chapter|book|section|part\s+', 'i')) or @class = 'chapter']

To this:
Code:
//*[((name()='h1' or name()='h2') and re:test(., '\d+', 'i')) or @class = 'chapter']

If you can't figure out how to fix the xpath, preprocessing did work anyway. All you would need to do is open this document up in Sigil and then save it, and you would automatically get a TOC, since Sigil autogenerates a TOC based on heading tags, and preprocess successfully inserted them.

Last edited by ldolse; 01-15-2011 at 08:18 AM.
ldolse is offline   Reply With Quote
Old 01-15-2011, 08:17 AM   #12
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
.....

Last edited by ldolse; 01-15-2011 at 08:22 AM.
ldolse is offline   Reply With Quote
Old 01-15-2011, 08:38 AM   #13
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
right - lets diagonse thie before I get lynched by the mods :-)

I have the same zip as you. I tick "restore defaults" on all the conversion screens & convert to epub, then view that in calibre viewer - the only TOC entry is "start"

repeat - but tick auto generate toc on toc screen... same result

open epub with sigil and I can see that structure detection has done its thing and has added the h2 tags ,& has created 1 file per chapter -&, I am not seeing the benefit in the form of a viewable TOC -

so we are both apparently right - your structure detection code IS working but I am not getting a TOC to view ...

AHA

I think you explained that actually, 2 posts up.... I need to open & close with sigil to compete the TOC generation, & that works & is fine for me. others may want a calibre-only solution and that would be to modify the xpath ?

I was maybe close to the answer but still in the dark !

so model answer.
step 1:enable structure detect - no need to enable auto generate toc. convert to epub.
at this stage the fchapters have been tagged & the fepub has been split with 1 file per chapter
step 2 open & close epub in sigil.

one final query on xpah - I come unstuck if I try to modify the end bit ( or class = chapter bit) in order to add an AND - is that because it is no possible.

say I wanted to replace the @class = 'chapter' test with a test for bold roman numerals so I'd want somewthing like ( @class = 'bold' AND re:test( [XIV]+ ). I'm not too clear on the retest syntax but have tried lots of invalid permutations!

is that doable and could you give a model answer for that please.

Last edited by cybmole; 01-15-2011 at 08:55 AM.
cybmole is offline   Reply With Quote
Old 01-15-2011, 08:44 AM   #14
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
The default settings will only split on H1 and H2 tags. It won't auto-magically put the chapters in a TOC. For that you need the right xpath.

That's why I told you to change the 'Detect Chapters at(Xpath expression).

Change it from the default to this:
Code:
//*[((name()='h1' or name()='h2') and re:test(., '\d+', 'i')) or @class = 'chapter']
And if you can't figure out Xpath for other files just open the file in Sigil and re-save it after using the pre-process option, you'll automatically get a TOC.

In the future use Calibre's bug reporting system if you have a file that you think doesn't work.
ldolse is offline   Reply With Quote
Old 01-15-2011, 09:10 AM   #15
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
thank you for you patience. it IS getting clearer gradually.

to sum up so far.

begin by converting source, in whatever format, to epub.

special case - if source is already EPUB but lacks header tags, then make a copy of it, rename that copy to .zip, add back to calibre as .zip, discard the bad epub, continue as per method for non-epub sources - below.

the purpose of structure detect ( amongst other things) is to identify and tag chapter starts. but it does not , by itself, create a TOC.

the xpath line defines how to identify what elements to place in a toc

if I cant customise the xpath successfully then I should use sigil - open - ( patch with regex if needed), save.

the auto-generate toc option , on TOC screen, is not relevant at this ( create a good epub) stage, & should not be needed thereafter when I convert the epub to mobi for its final destination on Kindle

yes - I'll go learn how to use the bug reports.
cybmole is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Losing Chapter PageBreaks in .MOBI on Kindle XRaySpeX Calibre 7 12-10-2010 05:29 AM
Unwanted Pagebreaks Timoleon Calibre 3 09-19-2010 07:53 PM
Pagebreaks for RTF to mobi question mputtr Calibre 1 03-17-2010 04:20 AM
Chapters showing unwanted pagebreaks and < h1 > text raltman Calibre 2 10-05-2009 04:50 PM
Epub and pagebreaks mtravellerh Calibre 3 11-02-2008 05:30 PM


All times are GMT -4. The time now is 04:49 PM.


MobileRead.com is a privately owned, operated and funded community.