MobileRead Forums - View Single Post

wallcraft · 07-26-2010, 08:17 PM

Quote:

Originally Posted by Humble

I have read quite a few threads before posting but they do not help me. I am trying to create a table of contents with my books. Can someone how to explain this is in layman's terms. I went to the Xpath tutorial and I don't understand all that stuff. Can anyone clarify in the simplest way to get table of contents in my books?

The default (Structure Detection) is:

Code:

//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\s+', 'i')) or @class = 'chapter']

What it means is that calibre will assume chapters start at either <h1> or <h2> tags that have any of the words (chapter, book, section or part) in them (in any mixture of upper and lower case) or that have the class=”chapter” attribute.

If you are editing the ebooks, then just put the chapter headings in h1 or h2 tags with Chapter (say) in the heading and/or make the class 'chapter'. Or see below for other XPATH settings you might use.

When generating a TOC for purchased ebooks, I have found that you need different XPATH values for different ebooks.

Versions that select all <h1> and <h2> (and <h3>) tags:

Code:

//*[name()='h1' or name()='h2']

//*[name()='h1' or name()='h2' or name()='h3']

A version like the default that in addition looks for numbers in the tag contents:

Code:

//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\s+|0|1|2|3|4|5|6|7|8|9', 'i')) or @class = 'chapter']

A version that looks for tag contents which is all capitals (no lowercase):

Code:

//*[((name()='h1' or name()='h2') and re:test(., '^[^a-z]+$')) or @class = 'chapter']

Any element (or just <p> tags) starting with Chapter:

Code:

//*[re:test(., '^chapter ', 'i')]

//h:p[re:test(., '^chapter ', 'i')]

Sometimes I first run once through Calibre (with --pretty-print) and if this does not produce a good TOC I run through again keying on one of Calibre's classes. Often calibre1 is what is needed, or calibre1 with a test like those used above, but unzip the epub and look inside to see what is needed in your case:

Code:

//*[@class = 'calibre1']

//*[@class = 'calibre1' and re:test(., 'chapter|book|section|part\s+|0|1|2|3|4|5|6|7|8|9', 'i')] 

//*[@class = 'calibre1' and re:test(., '^[^a-z]+$')]

With any of these, I sometimes need --use-auto-toc. However, --use-auto-toc isn't always good because an existing TOC might be ok.