Calibre settings for using 'chapter' in TOC

TargonD · 10-20-2010, 06:43 PM

Hello,
I have been looking though the forums and while I have found a way to use chapter as the tag for creating the TOC, I am getting alot of other hits for chapters that have nothing to do with 'chapter'. In looking at the tutorial it seams that //*[@class="chapter"] would do what I want, but I have no idea if this should go into the Detect Chapters under Structure Detection or Level 1 TOC under Table of Contents.
I have looked at the Xpath information and links that have been provided, but I really don't understand what the difference would be.

Thanks,

TargonD · 10-20-2010, 09:34 PM

OK after some work and using information that I found on another post. not sure which one, but with a little modification I ended up with this for the Detect chapters.

//*[((name()='p') and re:test(., 'CHAPTER', ""))]

The books I have been converting to epub were in RTF format so I made all of the chapter headers capital. I removed the 'i' so that it would be case sensitive. Then I used the level 1 TOC wizard to get.

//h:h1[re:test(@class, "chapter", "i")]

Not sure why this works, but I have done three books and all had a correct TOC with no extras.

I don't know if anyone is interested - but with all of the information that I have found on this site, I wanted to give a little back. I hope this helps some.

ldolse · 10-20-2010, 09:40 PM

The preprocess option could probably also help you in this case, it's one of the options under structure detection. Use that with the default Chapter detection settings.

jackie_w · 10-20-2010, 09:43 PM

When I have a book which has just a single level of TOC, I enter the same setting in both 'Structure Detection - Detect Chapters' and 'TOC Level 1'. I don't know whether this is the best thing to do, but it's always worked for me.

Manichean · 10-21-2010, 05:19 AM

Quote:

Originally Posted by TargonD

//*[((name()='p') and re:test(., 'CHAPTER', ""))]

Just as a general comment, that expression, if I'm not mistaken, selects entries that have the capitalized word "CHAPTER" inside <p>- tags. You might, in general, want to ignore case, as not all books capitalize their chapter headings. So, just include the flag "i" as below:

Code:

//*[((name()='p') and re:test(., 'CHAPTER', "i"))]

Gwen Morse · 10-21-2010, 09:13 AM

Quote:

Originally Posted by Manichean

Just as a general comment, that expression, if I'm not mistaken, selects entries that have the capitalized word "CHAPTER" inside <p>- tags. You might, in general, want to ignore case, as not all books capitalize their chapter headings. So, just include the flag "i" as below:

Code:

//*[((name()='p') and re:test(., 'CHAPTER', "i"))]

Is there an expression that will match Roman Numerals inside a paragraph tag?

Right now I have to search through the document by hand and prepend the word "Chapter" to each numeral.

I'd like something that would match it on its own.

Manichean · 10-21-2010, 09:36 AM

Quote:

Originally Posted by Gwen Morse

Is there an expression that will match Roman Numerals inside a paragraph tag?

Right now I have to search through the document by hand and prepend the word "Chapter" to each numeral.

I'd like something that would match it on its own.

Hm. Roman numerals contain combinations of "I", "V", "X" and maybe "L" (I don't think you'd have to go higher with books). So, the expression

Code:

[IVXL]+

should do the job.

Edit to add: There's a tutorial for regular expressions and XPath in the Calibre manual that might help.

Gwen Morse · 10-21-2010, 09:44 AM

Quote:

Originally Posted by Manichean

Hm. Roman numerals contain combinations of "I", "V", "X" and maybe "L" (I don't think you'd have to go higher with books). So, the expression

Code:

[IVXL]+

should do the job.

Edit to add: There's a tutorial for regular expressions and XPath in the Calibre manual that might help.

D'oh, I didn't think of just matching the letters used. I was thinking there might be some sort of "Roman Numeral Conversion", probably because Calibre converts to Roman Numerals for the series if you choose that option.

Manichean · 10-21-2010, 10:12 AM

Quote:

Originally Posted by Gwen Morse

D'oh, I didn't think of just matching the letters used. I was thinking there might be some sort of "Roman Numeral Conversion", probably because Calibre converts to Roman Numerals for the series if you choose that option.

I don't think that can be done. Regular expressions, which is what the matching here uses, don't have a concept of anything other than a character, that includes letters, numerals, punctuation and anything else. For example, the standard numerals set, \d, is to be interpreted as a set of characters representing (arabic) numerals, not as the numerals themselves.

theducks · 10-21-2010, 10:16 AM

Quote:

Originally Posted by Manichean

Hm. Roman numerals contain combinations of "I", "V", "X" and maybe "L" (I don't think you'd have to go higher with books). So, the expression

Code:

[IVXL]+

should do the job.

Edit to add: There's a tutorial for regular expressions and XPath in the Calibre manual that might help.

L.E. Modissett jr. easily makes it to 'CL'

in the Recluse series.

Manichean · 10-21-2010, 10:29 AM

Quote:

Originally Posted by theducks

L.E. Modissett jr. easily makes it to 'CL'

in the Recluse series.

Okay, then using

Code:

[IVXLCDM]+

at least should take care of most books

Gwen Morse · 10-30-2010, 11:46 PM

Quote:

Originally Posted by Manichean

Hm. Roman numerals contain combinations of "I", "V", "X" and maybe "L" (I don't think you'd have to go higher with books). So, the expression

Code:

[IVXL]+

should do the job.

Edit to add: There's a tutorial for regular expressions and XPath in the Calibre manual that might help.

The title: "THE LAST UNICORN" all in capitals triggers a match, like so:

<h2 id="calibre_toc_2" class="calibre4">THE LAST UNICORN</h2>

Otherwise, it does match all the roman numerals in this book and no other unwanted lines.

Manichean · 10-31-2010, 05:19 AM

Quote:

Originally Posted by Gwen Morse

The title: "THE LAST UNICORN" all in capitals triggers a match, like so:

<h2 id="calibre_toc_2" class="calibre4">THE LAST UNICORN</h2>

Otherwise, it does match all the roman numerals in this book and no other unwanted lines.

Well, technically, and I should have thought of that, everything containing one of the uppercase letters gets matched. Or, to be precise, the uppercase letters in the words get matched. To avoid this, look at the occurences of the roman numerals, find e.g. what tags they are enclosed in and alter the expression to match those as well.

Gwen Morse · 11-03-2010, 08:57 PM

Quote:

Originally Posted by Manichean

Well, technically, and I should have thought of that, everything containing one of the uppercase letters gets matched. Or, to be precise, the uppercase letters in the words get matched. To avoid this, look at the occurrences of the roman numerals, find e.g. what tags they are enclosed in and alter the expression to match those as well.

Will do that when I get home and look things over.

Other books have since triggered the match, I'd only used it in a few books and I was lucky that fewer than I tried had capitals that were caught.

10-20-2010, 06:43 PM	#1
TargonD Junior Member Posts: 2 Karma: 12 Join Date: Oct 2010 Device: nook	Calibre settings for using 'chapter' in TOC Hello, I have been looking though the forums and while I have found a way to use chapter as the tag for creating the TOC, I am getting alot of other hits for chapters that have nothing to do with 'chapter'. In looking at the tutorial it seams that //*[@class="chapter"] would do what I want, but I have no idea if this should go into the Detect Chapters under Structure Detection or Level 1 TOC under Table of Contents. I have looked at the Xpath information and links that have been provided, but I really don't understand what the difference would be. Thanks,

10-20-2010, 09:34 PM	#2
TargonD Junior Member Posts: 2 Karma: 12 Join Date: Oct 2010 Device: nook	Found answer OK after some work and using information that I found on another post. not sure which one, but with a little modification I ended up with this for the Detect chapters. //*[((name()='p') and re:test(., 'CHAPTER', ""))] The books I have been converting to epub were in RTF format so I made all of the chapter headers capital. I removed the 'i' so that it would be case sensitive. Then I used the level 1 TOC wizard to get. //h:h1[re:test(@class, "chapter", "i")] Not sure why this works, but I have done three books and all had a correct TOC with no extras. I don't know if anyone is interested - but with all of the information that I have found on this site, I wanted to give a little back. I hope this helps some.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
can calibre creat a TOC by reading numerals as chapter marks	p3aul	Calibre	7	10-04-2010 04:18 AM
Les Miserables - TOC / Chapter Flags?	gshipley	Amazon Kindle	1	09-28-2009 10:07 PM
ADE and long chapter names in ToC	frabjous	Reading and Management	3	08-14-2009 11:16 AM
Chapter or TOC Creation help needed	gandor62	Calibre	4	04-15-2009 02:18 PM
html2epub TOC and chapter detection help	ilovejedd	Calibre	6	02-22-2009 05:58 PM

10-20-2010, 09:40 PM	#3
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	The preprocess option could probably also help you in this case, it's one of the options under structure detection. Use that with the default Chapter detection settings.

10-20-2010, 09:43 PM	#4
jackie_w Grand Sorcerer Posts: 6,212 Karma: 16534894 Join Date: Sep 2009 Location: UK Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3	When I have a book which has just a single level of TOC, I enter the same setting in both 'Structure Detection - Detect Chapters' and 'TOC Level 1'. I don't know whether this is the best thing to do, but it's always worked for me.

Advert

Advert