Chapter detection for roman numerals??

g25 · 09-07-2015, 02:03 AM

Hi, when I convert I would like calibre to detect roman numeral chapters. These chapters don't have the word "chapter" in them, just:
I
II
III
IV
etc..

I figured something like:
[IVXLCDM]+ but I'm not sure exactly how to write it into the
"detect chapter" Xpath expression under the Structural Detection section. Some of the books have the <h2> class tag and some of them have the normal <p> class tag. What would be the exact expression I would use in the xpath line that would detect any combination of "IVXLCDM" as chapters?

Thanks!

eschwartz · 09-07-2015, 10:37 AM

Better question: What XPath would also manage to avoid the "I" as in, "you and I"?
You need more semantic information.

Are there any unique classes or ids used in the <p> chapter tags?

theducks · 09-07-2015, 10:54 AM

Quote:

Originally Posted by g25

Hi, when I convert I would like calibre to detect roman numeral chapters. These chapters don't have the word "chapter" in them, just:
I
II
III
IV
etc..

I figured something like:
[IVXLCDM]+ but I'm not sure exactly how to write it into the
"detect chapter" Xpath expression under the Structural Detection section. Some of the books have the <h2> class tag and some of them have the normal <p> class tag. What would be the exact expression I would use in the xpath line that would detect any combination of "IVXLCDM" as chapters?

Thanks!

I don't remember ever seeing a chapter in the M range

D, did I forget one

I do a lot of my work with the editor, rather than fight a special case Xpath.

Beware the Lone I

(I want more) in other places. The Roman Numerals need to exist as the only string between tags or along with a limited set of defined keywords
([CLXVI]{1,7}) is the basic part of my EDITOR search term

The TOC tool (also inside the editor) allows all sorts of solutions for indexing a book.

g25 · 09-08-2015, 09:31 AM

well ([CLXVI]{1,7}) works but unfortunately grabs every other sentence that starts with those letters :/

Anyway to tell it to do ONLY the roman numerals that are on a line by themselves?

g25 · 09-08-2015, 09:41 AM

Quote:

Originally Posted by eschwartz

Better question: What XPath would also manage to avoid the "I" as in, "you and I"?
You need more semantic information.

Are there any unique classes or ids used in the <p> chapter tags?

Nope. The classes/ids aren't unique at all. The roman numerals are on a line by themselves though. Can't we have an expression that says: Look for any combination of these roman numerals that reside on a line by themselves?

Or reside within ANY tag that has NO other text but a combination of this ([CLXVI]{1,7}) ?

g25 · 09-08-2015, 10:18 AM

Ah got it I think:

//*[re:test(., '[CLXVI]+$')]

seems to work!

gbm · 09-08-2015, 10:28 AM

Quote:

Originally Posted by g25

Nope. The classes/ids aren't unique at all. The roman numerals are on a line by themselves though. Can't we have an expression that says: Look for any combination of these roman numerals that reside on a line by themselves?

Or reside within ANY tag that has NO other text but a combination of this ([CLXVI]{1,7}) ?

Using the editor this will find the roman numerals that reside on a line by themselves, but you will need to run it using find next, then edit chapter heading manually.
*

Code:

>([CLXVI]{1,7})<

*Assumes that there are no spaces before or after the roman numerals.

bernie

09-07-2015, 02:03 AM	#1
g25 Junior Member Posts: 8 Karma: 10 Join Date: Sep 2015 Device: Nexus 7	Chapter detection for roman numerals?? Hi, when I convert I would like calibre to detect roman numeral chapters. These chapters don't have the word "chapter" in them, just: I II III IV etc.. I figured something like: [IVXLCDM]+ but I'm not sure exactly how to write it into the "detect chapter" Xpath expression under the Structural Detection section. Some of the books have the <h2> class tag and some of them have the normal <p> class tag. What would be the exact expression I would use in the xpath line that would detect any combination of "IVXLCDM" as chapters? Thanks! Last edited by g25; 09-07-2015 at 02:17 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
regex-function convert roman numerals	weberr	Editor	11	09-22-2021 05:15 PM
Roman Numerals for series and on book jacket	Arbait	Library Management	31	05-17-2015 01:16 AM
Disabling roman numerals in series display?	MelBr	Calibre	2	09-19-2013 10:49 PM
Convert Roman numerals to Arabic?	Peter W	Sigil	2	04-09-2012 11:55 AM
regex search for roman numerals	Blurr	Calibre	2	12-16-2009 05:55 PM

09-07-2015, 10:37 AM	#2
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	Better question: What XPath would also manage to avoid the "I" as in, "you and I"? You need more semantic information. Are there any unique classes or ids used in the <p> chapter tags?

09-08-2015, 09:31 AM	#4
g25 Junior Member Posts: 8 Karma: 10 Join Date: Sep 2015 Device: Nexus 7	well ([CLXVI]{1,7}) works but unfortunately grabs every other sentence that starts with those letters :/ Anyway to tell it to do ONLY the roman numerals that are on a line by themselves?

09-08-2015, 10:18 AM	#6
g25 Junior Member Posts: 8 Karma: 10 Join Date: Sep 2015 Device: Nexus 7	Ah got it I think: //*[re:test(., '[CLXVI]+$')] seems to work!