Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 04-08-2012, 04:08 AM   #1
JimLL
Cynic
JimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensions
 
Posts: 75
Karma: 51078
Join Date: Feb 2012
Device: Kindle
Xpath Expression list

Where can I find a list of proper Xpath expressions with what each does?

For instance, I want to que the TOC on the word "Chapter " in a document that has no heading attributes.

I found that

//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\s+', 'i')) or @class = 'chapter']

does not do it. Surely there is an expression that will find one word. Surely someone has compiled a list of expressions.

Yes, I know there is a lot of tech talk in the helps: (tags, expressions, classes), but that is precisely the problem. Besides the weird syntax to attempt to remember, every explanation uses words that themselves need explaining and I simply can't remember it enough to put everything together.

Guys with great memories will have problems with my saying that. The usual comment assumes laziness. They'll say, "Just read it!" Well I'm sorry, when your brain is made of chicken wire, when you've had issues, "Just read it!" doesn't get it.

List?
JimLL is offline   Reply With Quote
Old 04-08-2012, 04:40 AM   #2
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
Your problem is with the name() elements, which only find h1 and h2 tags.

In my experience, searching for chapter alone often returns false positives but try one of:
Code:
//*[re:test(., '^chapter ', 'i')]
//h:p[re:test(., '^chapter ', 'i')]
//*[re:test(., 'Chapter [1-9]')]
The "*" means search everything and often leads to multiple hits on the same chapter, if so try the "h<semicolon>p" version which limits the search to <p> elements (or something like that). The "^" means match the start of a line and [1-9] means a character between 1 and 9. Remove the ", 'i'" to not ignore case (exact match of the string elements).

Last edited by wallcraft; 04-08-2012 at 04:51 AM.
wallcraft is offline   Reply With Quote
Advert
Old 04-08-2012, 02:02 PM   #3
JimLL
Cynic
JimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensions
 
Posts: 75
Karma: 51078
Join Date: Feb 2012
Device: Kindle
Quote:
Originally Posted by wallcraft View Post
Your problem is with the name() elements, which only find h1 and h2 tags.

In my experience, searching for chapter alone often returns false positives but try one of:
Code:
//*[re:test(., '^chapter ', 'i')]
//h:p[re:test(., '^chapter ', 'i')]
//*[re:test(., 'Chapter [1-9]')]
The "*" means search everything and often leads to multiple hits on the same chapter, if so try the "h<semicolon>p" version which limits the search to <p> elements (or something like that). The "^" means match the start of a line and [1-9] means a character between 1 and 9. Remove the ", 'i'" to not ignore case (exact match of the string elements).
Thanks, I'll work on this.

I don't know what <p> elements means, because I don't what elements are.

Does that [1-9] apply to multiple following digits? Like Chapter 27 or Chapter 103?

No, I don't think I want to ignore case.
JimLL is offline   Reply With Quote
Old 04-08-2012, 02:16 PM   #4
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
Quote:
Originally Posted by JimLL View Post
I don't know what <p> elements means, because I don't what elements are.

Does that [1-9] apply to multiple following digits? Like Chapter 27 or Chapter 103?

No, I don't think I want to ignore case.
An element is what HTML is made up of, see HTML Elements. My point was to try "*" first (match everywhere) and if you get multiple entries for the same chapter, then try "h<semicolon>p". It does not necessarily matter what it means, only that is often works to knock out multiple matches.

The [1-9] only matches the first digit, but the chapter heading is typically taken to be the entire element (that word again) that contained the match. So the intent is to exclude "chapter" used in a sentence, by requiring it to be followed by a digit. If you want to match multiple digits, use [0-9]+ where the "+" means one or more instances and note the 0 (e.g. in 10, although this will also match chapter 0).

Not ignoring case is another way to limit the matches to those you want. If the simplest match gives what you want there is no need to try anything else.
wallcraft is offline   Reply With Quote
Old 04-08-2012, 02:29 PM   #5
JimLL
Cynic
JimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensionsJimLL can understand the language of future parallel dimensions
 
Posts: 75
Karma: 51078
Join Date: Feb 2012
Device: Kindle
Thanks, wallcraft.
JimLL is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help with XPath NASCARaddicted ePub 13 02-20-2015 06:04 AM
Xpath expression for detecting chapter marks p3aul Calibre 5 11-14-2010 11:14 PM
Xpath TOC Expression Agama Calibre 2 07-12-2010 02:24 AM
XPath Expression wizard?? tonyx3 Calibre 1 01-26-2010 04:49 AM
XPath Expression Wizard Error Crusader Calibre 4 12-27-2009 12:09 PM


All times are GMT -4. The time now is 09:02 AM.


MobileRead.com is a privately owned, operated and funded community.