Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 10-20-2010, 06:43 PM   #1
TargonD
Junior Member
TargonD began at the beginning.
 
Posts: 2
Karma: 12
Join Date: Oct 2010
Device: nook
Calibre settings for using 'chapter' in TOC

Hello,
I have been looking though the forums and while I have found a way to use chapter as the tag for creating the TOC, I am getting alot of other hits for chapters that have nothing to do with 'chapter'. In looking at the tutorial it seams that //*[@class="chapter"] would do what I want, but I have no idea if this should go into the Detect Chapters under Structure Detection or Level 1 TOC under Table of Contents.
I have looked at the Xpath information and links that have been provided, but I really don't understand what the difference would be.

Thanks,
TargonD is offline   Reply With Quote
Old 10-20-2010, 09:34 PM   #2
TargonD
Junior Member
TargonD began at the beginning.
 
Posts: 2
Karma: 12
Join Date: Oct 2010
Device: nook
Found answer

OK after some work and using information that I found on another post. not sure which one, but with a little modification I ended up with this for the Detect chapters.

//*[((name()='p') and re:test(., 'CHAPTER', ""))]

The books I have been converting to epub were in RTF format so I made all of the chapter headers capital. I removed the 'i' so that it would be case sensitive. Then I used the level 1 TOC wizard to get.

//h:h1[re:test(@class, "chapter", "i")]

Not sure why this works, but I have done three books and all had a correct TOC with no extras.

I don't know if anyone is interested - but with all of the information that I have found on this site, I wanted to give a little back. I hope this helps some.
TargonD is offline   Reply With Quote
Advert
Old 10-20-2010, 09:40 PM   #3
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
The preprocess option could probably also help you in this case, it's one of the options under structure detection. Use that with the default Chapter detection settings.
ldolse is offline   Reply With Quote
Old 10-20-2010, 09:43 PM   #4
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,208
Karma: 16534692
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
When I have a book which has just a single level of TOC, I enter the same setting in both 'Structure Detection - Detect Chapters' and 'TOC Level 1'. I don't know whether this is the best thing to do, but it's always worked for me.
jackie_w is offline   Reply With Quote
Old 10-21-2010, 05:19 AM   #5
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by TargonD View Post
//*[((name()='p') and re:test(., 'CHAPTER', ""))]
Just as a general comment, that expression, if I'm not mistaken, selects entries that have the capitalized word "CHAPTER" inside <p>- tags. You might, in general, want to ignore case, as not all books capitalize their chapter headings. So, just include the flag "i" as below:
Code:
//*[((name()='p') and re:test(., 'CHAPTER', "i"))]
Manichean is offline   Reply With Quote
Advert
Old 10-21-2010, 09:13 AM   #6
Gwen Morse
Addict
Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.
 
Posts: 254
Karma: 59872
Join Date: Dec 2009
Location: New York, USA
Device: Kindle 3 (wifi) + nokia n900 tablet phone
Quote:
Originally Posted by Manichean View Post
Just as a general comment, that expression, if I'm not mistaken, selects entries that have the capitalized word "CHAPTER" inside <p>- tags. You might, in general, want to ignore case, as not all books capitalize their chapter headings. So, just include the flag "i" as below:
Code:
//*[((name()='p') and re:test(., 'CHAPTER', "i"))]
Is there an expression that will match Roman Numerals inside a paragraph tag?

Right now I have to search through the document by hand and prepend the word "Chapter" to each numeral.

I'd like something that would match it on its own.
Gwen Morse is offline   Reply With Quote
Old 10-21-2010, 09:36 AM   #7
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Gwen Morse View Post
Is there an expression that will match Roman Numerals inside a paragraph tag?

Right now I have to search through the document by hand and prepend the word "Chapter" to each numeral.

I'd like something that would match it on its own.
Hm. Roman numerals contain combinations of "I", "V", "X" and maybe "L" (I don't think you'd have to go higher with books). So, the expression
Code:
[IVXL]+
should do the job.

Edit to add: There's a tutorial for regular expressions and XPath in the Calibre manual that might help.

Last edited by Manichean; 10-21-2010 at 09:39 AM.
Manichean is offline   Reply With Quote
Old 10-21-2010, 09:44 AM   #8
Gwen Morse
Addict
Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.
 
Posts: 254
Karma: 59872
Join Date: Dec 2009
Location: New York, USA
Device: Kindle 3 (wifi) + nokia n900 tablet phone
Quote:
Originally Posted by Manichean View Post
Hm. Roman numerals contain combinations of "I", "V", "X" and maybe "L" (I don't think you'd have to go higher with books). So, the expression
Code:
[IVXL]+
should do the job.

Edit to add: There's a tutorial for regular expressions and XPath in the Calibre manual that might help.
D'oh, I didn't think of just matching the letters used. I was thinking there might be some sort of "Roman Numeral Conversion", probably because Calibre converts to Roman Numerals for the series if you choose that option.
Gwen Morse is offline   Reply With Quote
Old 10-21-2010, 10:12 AM   #9
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Gwen Morse View Post
D'oh, I didn't think of just matching the letters used. I was thinking there might be some sort of "Roman Numeral Conversion", probably because Calibre converts to Roman Numerals for the series if you choose that option.
I don't think that can be done. Regular expressions, which is what the matching here uses, don't have a concept of anything other than a character, that includes letters, numerals, punctuation and anything else. For example, the standard numerals set, \d, is to be interpreted as a set of characters representing (arabic) numerals, not as the numerals themselves.
Manichean is offline   Reply With Quote
Old 10-21-2010, 10:16 AM   #10
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Manichean View Post
Hm. Roman numerals contain combinations of "I", "V", "X" and maybe "L" (I don't think you'd have to go higher with books). So, the expression
Code:
[IVXL]+
should do the job.

Edit to add: There's a tutorial for regular expressions and XPath in the Calibre manual that might help.
L.E. Modissett jr. easily makes it to 'CL' in the Recluse series.
theducks is online now   Reply With Quote
Old 10-21-2010, 10:29 AM   #11
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by theducks View Post
L.E. Modissett jr. easily makes it to 'CL' in the Recluse series.
Okay, then using
Code:
[IVXLCDM]+
at least should take care of most books
Manichean is offline   Reply With Quote
Old 10-30-2010, 11:46 PM   #12
Gwen Morse
Addict
Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.
 
Posts: 254
Karma: 59872
Join Date: Dec 2009
Location: New York, USA
Device: Kindle 3 (wifi) + nokia n900 tablet phone
Quote:
Originally Posted by Manichean View Post
Hm. Roman numerals contain combinations of "I", "V", "X" and maybe "L" (I don't think you'd have to go higher with books). So, the expression
Code:
[IVXL]+
should do the job.

Edit to add: There's a tutorial for regular expressions and XPath in the Calibre manual that might help.
The title: "THE LAST UNICORN" all in capitals triggers a match, like so:

<h2 id="calibre_toc_2" class="calibre4">THE LAST UNICORN</h2>

Otherwise, it does match all the roman numerals in this book and no other unwanted lines.
Gwen Morse is offline   Reply With Quote
Old 10-31-2010, 05:19 AM   #13
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Gwen Morse View Post
The title: "THE LAST UNICORN" all in capitals triggers a match, like so:

<h2 id="calibre_toc_2" class="calibre4">THE LAST UNICORN</h2>

Otherwise, it does match all the roman numerals in this book and no other unwanted lines.
Well, technically, and I should have thought of that, everything containing one of the uppercase letters gets matched. Or, to be precise, the uppercase letters in the words get matched. To avoid this, look at the occurences of the roman numerals, find e.g. what tags they are enclosed in and alter the expression to match those as well.
Manichean is offline   Reply With Quote
Old 11-03-2010, 08:57 PM   #14
Gwen Morse
Addict
Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.Gwen Morse never is beset by a damp, drizzly November in his or her soul.
 
Posts: 254
Karma: 59872
Join Date: Dec 2009
Location: New York, USA
Device: Kindle 3 (wifi) + nokia n900 tablet phone
Quote:
Originally Posted by Manichean View Post
Well, technically, and I should have thought of that, everything containing one of the uppercase letters gets matched. Or, to be precise, the uppercase letters in the words get matched. To avoid this, look at the occurrences of the roman numerals, find e.g. what tags they are enclosed in and alter the expression to match those as well.
Will do that when I get home and look things over.

Other books have since triggered the match, I'd only used it in a few books and I was lucky that fewer than I tried had capitals that were caught.
Gwen Morse is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
can calibre creat a TOC by reading numerals as chapter marks p3aul Calibre 7 10-04-2010 04:18 AM
Les Miserables - TOC / Chapter Flags? gshipley Amazon Kindle 1 09-28-2009 10:07 PM
ADE and long chapter names in ToC frabjous Reading and Management 3 08-14-2009 11:16 AM
Chapter or TOC Creation help needed gandor62 Calibre 4 04-15-2009 02:18 PM
html2epub TOC and chapter detection help ilovejedd Calibre 6 02-22-2009 05:58 PM


All times are GMT -4. The time now is 05:50 PM.


MobileRead.com is a privately owned, operated and funded community.