View Single Post
Old 07-26-2010, 08:17 PM   #2
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,977
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
Quote:
Originally Posted by Humble View Post
I have read quite a few threads before posting but they do not help me. I am trying to create a table of contents with my books. Can someone how to explain this is in layman's terms. I went to the Xpath tutorial and I don't understand all that stuff. Can anyone clarify in the simplest way to get table of contents in my books?
The default (Structure Detection) is:
Code:
//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\s+', 'i')) or @class = 'chapter']
What it means is that calibre will assume chapters start at either <h1> or <h2> tags that have any of the words (chapter, book, section or part) in them (in any mixture of upper and lower case) or that have the class=”chapter” attribute.

If you are editing the ebooks, then just put the chapter headings in h1 or h2 tags with Chapter (say) in the heading and/or make the class 'chapter'. Or see below for other XPATH settings you might use.

When generating a TOC for purchased ebooks, I have found that you need different XPATH values for different ebooks.

Versions that select all <h1> and <h2> (and <h3>) tags:
Code:
//*[name()='h1' or name()='h2']

//*[name()='h1' or name()='h2' or name()='h3']
A version like the default that in addition looks for numbers in the tag contents:
Code:
//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\s+|0|1|2|3|4|5|6|7|8|9', 'i')) or @class = 'chapter']
A version that looks for tag contents which is all capitals (no lowercase):
Code:
//*[((name()='h1' or name()='h2') and re:test(., '^[^a-z]+$')) or @class = 'chapter']
Any element (or just <p> tags) starting with Chapter:
Code:
//*[re:test(., '^chapter ', 'i')]

//h:p[re:test(., '^chapter ', 'i')]
Sometimes I first run once through Calibre (with --pretty-print) and if this does not produce a good TOC I run through again keying on one of Calibre's classes. Often calibre1 is what is needed, or calibre1 with a test like those used above, but unzip the epub and look inside to see what is needed in your case:
Code:
//*[@class = 'calibre1']

//*[@class = 'calibre1' and re:test(., 'chapter|book|section|part\s+|0|1|2|3|4|5|6|7|8|9', 'i')] 

//*[@class = 'calibre1' and re:test(., '^[^a-z]+$')]
With any of these, I sometimes need --use-auto-toc. However, --use-auto-toc isn't always good because an existing TOC might be ok.

Last edited by wallcraft; 07-26-2010 at 08:20 PM.
wallcraft is offline   Reply With Quote