Xpath Expression list

JimLL · 04-08-2012, 04:08 AM

Where can I find a list of proper Xpath expressions with what each does?

For instance, I want to que the TOC on the word "Chapter " in a document that has no heading attributes.

I found that

//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\s+', 'i')) or @class = 'chapter']

does not do it. Surely there is an expression that will find one word. Surely someone has compiled a list of expressions.

Yes, I know there is a lot of tech talk in the helps: (tags, expressions, classes), but that is precisely the problem. Besides the weird syntax to attempt to remember, every explanation uses words that themselves need explaining and I simply can't remember it enough to put everything together.

Guys with great memories will have problems with my saying that. The usual comment assumes laziness. They'll say, "Just read it!" Well I'm sorry, when your brain is made of chicken wire, when you've had issues, "Just read it!" doesn't get it.

List?

wallcraft · 04-08-2012, 04:40 AM

Your problem is with the name() elements, which only find h1 and h2 tags.

In my experience, searching for chapter alone often returns false positives but try one of:

Code:

//*[re:test(., '^chapter ', 'i')]
//h:p[re:test(., '^chapter ', 'i')]
//*[re:test(., 'Chapter [1-9]')]

The "*" means search everything and often leads to multiple hits on the same chapter, if so try the "h<semicolon>p" version which limits the search to <p> elements (or something like that). The "^" means match the start of a line and [1-9] means a character between 1 and 9. Remove the ", 'i'" to not ignore case (exact match of the string elements).

JimLL · 04-08-2012, 02:02 PM

Quote:

Originally Posted by wallcraft

Your problem is with the name() elements, which only find h1 and h2 tags.

In my experience, searching for chapter alone often returns false positives but try one of:

Code:

//*[re:test(., '^chapter ', 'i')]
//h:p[re:test(., '^chapter ', 'i')]
//*[re:test(., 'Chapter [1-9]')]

The "*" means search everything and often leads to multiple hits on the same chapter, if so try the "h<semicolon>p" version which limits the search to <p> elements (or something like that). The "^" means match the start of a line and [1-9] means a character between 1 and 9. Remove the ", 'i'" to not ignore case (exact match of the string elements).

Thanks, I'll work on this.

I don't know what <p> elements means, because I don't what elements are.

Does that [1-9] apply to multiple following digits? Like Chapter 27 or Chapter 103?

No, I don't think I want to ignore case.

wallcraft · 04-08-2012, 02:16 PM

Quote:

Originally Posted by JimLL

I don't know what <p> elements means, because I don't what elements are.

Does that [1-9] apply to multiple following digits? Like Chapter 27 or Chapter 103?

No, I don't think I want to ignore case.

An element is what HTML is made up of, see HTML Elements. My point was to try "*" first (match everywhere) and if you get multiple entries for the same chapter, then try "h<semicolon>p". It does not necessarily matter what it means, only that is often works to knock out multiple matches.

The [1-9] only matches the first digit, but the chapter heading is typically taken to be the entire element (that word again) that contained the match. So the intent is to exclude "chapter" used in a sentence, by requiring it to be followed by a digit. If you want to match multiple digits, use [0-9]+ where the "+" means one or more instances and note the 0 (e.g. in 10, although this will also match chapter 0).

Not ignoring case is another way to limit the matches to those you want. If the simplest match gives what you want there is no need to try anything else.

JimLL · 04-08-2012, 02:29 PM

Thanks, wallcraft.

04-08-2012, 04:08 AM	#1
JimLL Cynic Posts: 75 Karma: 51078 Join Date: Feb 2012 Device: Kindle	Xpath Expression list Where can I find a list of proper Xpath expressions with what each does? For instance, I want to que the TOC on the word "Chapter " in a document that has no heading attributes. I found that //*[((name()='h1' or name()='h2') and re:test(., 'chapter\|book\|section\|part\s+', 'i')) or @class = 'chapter'] does not do it. Surely there is an expression that will find one word. Surely someone has compiled a list of expressions. Yes, I know there is a lot of tech talk in the helps: (tags, expressions, classes), but that is precisely the problem. Besides the weird syntax to attempt to remember, every explanation uses words that themselves need explaining and I simply can't remember it enough to put everything together. Guys with great memories will have problems with my saying that. The usual comment assumes laziness. They'll say, "Just read it!" Well I'm sorry, when your brain is made of chicken wire, when you've had issues, "Just read it!" doesn't get it. List?

04-08-2012, 04:40 AM	#2
wallcraft reader Posts: 6,975 Karma: 5183568 Join Date: Mar 2006 Location: Mississippi, USA Device: Kindle 3, Kobo Glo HD	Your problem is with the name() elements, which only find h1 and h2 tags. In my experience, searching for chapter alone often returns false positives but try one of: Code: //[re:test(., '^chapter ', 'i')] //h:p[re:test(., '^chapter ', 'i')] //[re:test(., 'Chapter [1-9]')] The "" means search everything and often leads to multiple hits on the same chapter, if so try the "h<semicolon>p" version which limits the search to <p> elements (or something like that). The "^" means match the start of a line and [1-9] means a character between 1 and 9. Remove the ", 'i'" to not ignore case (exact match of the string elements). Last edited by wallcraft; 04-08-2012 at 04:51 AM.*

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Need help with XPath	NASCARaddicted	ePub	13	02-20-2015 06:04 AM
Xpath expression for detecting chapter marks	p3aul	Calibre	5	11-14-2010 11:14 PM
Xpath TOC Expression	Agama	Calibre	2	07-12-2010 02:24 AM
XPath Expression wizard??	tonyx3	Calibre	1	01-26-2010 04:49 AM
XPath Expression Wizard Error	Crusader	Calibre	4	12-27-2009 12:09 PM

04-08-2012, 02:29 PM	#5
JimLL Cynic Posts: 75 Karma: 51078 Join Date: Feb 2012 Device: Kindle	Thanks, wallcraft.

Advert

Advert