Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 09-07-2015, 02:03 AM   #1
g25
Junior Member
g25 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2015
Device: Nexus 7
Chapter detection for roman numerals??

Hi, when I convert I would like calibre to detect roman numeral chapters. These chapters don't have the word "chapter" in them, just:
I
II
III
IV
etc..

I figured something like:
[IVXLCDM]+ but I'm not sure exactly how to write it into the
"detect chapter" Xpath expression under the Structural Detection section. Some of the books have the <h2> class tag and some of them have the normal <p> class tag. What would be the exact expression I would use in the xpath line that would detect any combination of "IVXLCDM" as chapters?

Thanks!

Last edited by g25; 09-07-2015 at 02:17 AM.
g25 is offline   Reply With Quote
Old 09-07-2015, 10:37 AM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Better question: What XPath would also manage to avoid the "I" as in, "you and I"?
You need more semantic information.


Are there any unique classes or ids used in the <p> chapter tags?
eschwartz is offline   Reply With Quote
Old 09-07-2015, 10:54 AM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,803
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by g25 View Post
Hi, when I convert I would like calibre to detect roman numeral chapters. These chapters don't have the word "chapter" in them, just:
I
II
III
IV
etc..

I figured something like:
[IVXLCDM]+ but I'm not sure exactly how to write it into the
"detect chapter" Xpath expression under the Structural Detection section. Some of the books have the <h2> class tag and some of them have the normal <p> class tag. What would be the exact expression I would use in the xpath line that would detect any combination of "IVXLCDM" as chapters?

Thanks!
I don't remember ever seeing a chapter in the M range
D, did I forget one

I do a lot of my work with the editor, rather than fight a special case Xpath.

Beware the Lone I (I want more) in other places. The Roman Numerals need to exist as the only string between tags or along with a limited set of defined keywords
([CLXVI]{1,7}) is the basic part of my EDITOR search term

The TOC tool (also inside the editor) allows all sorts of solutions for indexing a book.
theducks is offline   Reply With Quote
Old 09-08-2015, 09:31 AM   #4
g25
Junior Member
g25 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2015
Device: Nexus 7
well ([CLXVI]{1,7}) works but unfortunately grabs every other sentence that starts with those letters :/

Anyway to tell it to do ONLY the roman numerals that are on a line by themselves?
g25 is offline   Reply With Quote
Old 09-08-2015, 09:41 AM   #5
g25
Junior Member
g25 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2015
Device: Nexus 7
Quote:
Originally Posted by eschwartz View Post
Better question: What XPath would also manage to avoid the "I" as in, "you and I"?
You need more semantic information.


Are there any unique classes or ids used in the <p> chapter tags?
Nope. The classes/ids aren't unique at all. The roman numerals are on a line by themselves though. Can't we have an expression that says: Look for any combination of these roman numerals that reside on a line by themselves?

Or reside within ANY tag that has NO other text but a combination of this ([CLXVI]{1,7}) ?

Last edited by g25; 09-08-2015 at 09:59 AM.
g25 is offline   Reply With Quote
Old 09-08-2015, 10:18 AM   #6
g25
Junior Member
g25 began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2015
Device: Nexus 7
Ah got it I think:

//*[re:test(., '[CLXVI]+$')]

seems to work!
g25 is offline   Reply With Quote
Old 09-08-2015, 10:28 AM   #7
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,082
Karma: 8796704
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by g25 View Post
Nope. The classes/ids aren't unique at all. The roman numerals are on a line by themselves though. Can't we have an expression that says: Look for any combination of these roman numerals that reside on a line by themselves?

Or reside within ANY tag that has NO other text but a combination of this ([CLXVI]{1,7}) ?
Using the editor this will find the roman numerals that reside on a line by themselves, but you will need to run it using find next, then edit chapter heading manually.
*
Code:
>([CLXVI]{1,7})<
*Assumes that there are no spaces before or after the roman numerals.

bernie
gbm is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
regex-function convert roman numerals weberr Editor 11 09-22-2021 05:15 PM
Roman Numerals for series and on book jacket Arbait Library Management 31 05-17-2015 01:16 AM
Disabling roman numerals in series display? MelBr Calibre 2 09-19-2013 10:49 PM
Convert Roman numerals to Arabic? Peter W Sigil 2 04-09-2012 11:55 AM
regex search for roman numerals Blurr Calibre 2 12-16-2009 05:55 PM


All times are GMT -4. The time now is 10:45 PM.


MobileRead.com is a privately owned, operated and funded community.