Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-26-2010, 11:28 AM   #1
tram
Junior Member
tram began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2010
Device: Kindle
Calibre's XPath implementation for TOC detection

I'm converting a few books to go on my kindle, and I'm trying to get the chapters set up the way I want. One thing I'm noticing is that Calibre seems to use a slightly different flavor of xpath than what I'm used to. Where I would write
Code:
//p[matches(., '^[A-Z ]+$')]
, Calibre seems to respond to things in the form
Code:
//h:p[re:test(., '[A-Z]+')]
re:test seems to behave differently than matches. Is there any documentation on re:test, or anything anyone can tell me that might enhance my googling abilities?

The regular espression I'm trying to use that works with matches but not with re:test follows:

Code:
/h:p[re:test(., '^[A-Z]+ [A-Z ]+$')]
tram is offline   Reply With Quote
Old 11-26-2010, 11:33 AM   #2
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
There's documentation on XPath in the manual. The behaviour of the regular expressions and the re.test function should be documented in the python documentation.
Manichean is offline   Reply With Quote
Advert
Old 11-26-2010, 12:33 PM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by tram View Post
re:test seems to behave differently than matches. Is there any documentation on re:test, or anything anyone can tell me that might enhance my googling abilities?

The regular espression I'm trying to use that works with matches but not with re:test follows:

Code:
/h:p[re:test(., '^[A-Z]+ [A-Z ]+$')]
What makes you think test doesn't work the same as matches? Is it possible that what you are trying to match with test isn't the same as what you are testing the match expression on? Are you trying to match across lines?
re:test()
re:test(src, pattern, flags) returns true if the string src matches the regular expression pattern.
Starson17 is offline   Reply With Quote
Old 11-26-2010, 12:59 PM   #4
tram
Junior Member
tram began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2010
Device: Kindle
I have an XPath visualization tool that I've been using to test expressions, querying against the HTML file that gets left in the debug 'processed' directory. Using that tool, I pull out the elements I'm looking for, but I use matches() instead of re:test. The tool isn't written in python, so that's probably the difference.

I can check the manual, but is there a standard method for testing xpath queries aside from the generate-and-view method? I feel like the regular expression I'm trying to use isn't particularly complicated, and that it should work just fine.
tram is offline   Reply With Quote
Old 11-26-2010, 01:28 PM   #5
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by tram View Post
I can check the manual, but is there a standard method for testing xpath queries aside from the generate-and-view method? I feel like the regular expression I'm trying to use isn't particularly complicated, and that it should work just fine.
I don't know if XPath comes in several flavors as well, but for regexes, your test tool needs to support the python flavor. Personally, I use python to test expressions, though that may be a bit overkill if you don't actually use it as a language...
Manichean is offline   Reply With Quote
Advert
Old 11-26-2010, 02:04 PM   #6
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,977
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
The XPath Tutorial says:
Quote:
re:test()
re:test(src, pattern, flags) returns true if the string src matches the regular expression pattern. A particularly useful flag is i, it makes matching case insensitive. A good primer on the syntax for regular expressions can be found at regexp syntax
Python Docs say:
Quote:
7.2.6.4. search() vs. match()

In a nutshell, match() only attempts to match a pattern at the beginning of a string where search() will match a pattern anywhere in a string.
Is "matching" being used in the Python sense in re:test, i.e. as in re:match?
wallcraft is offline   Reply With Quote
Old 11-26-2010, 02:58 PM   #7
janvanmaar
Addict
janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.
 
Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
Quote:
Originally Posted by tram View Post
I'm converting a few books to go on my kindle, and I'm trying to get the chapters set up the way I want. One thing I'm noticing is that Calibre seems to use a slightly different flavor of xpath than what I'm used to. Where I would write
Code:
//p[matches(., '^[A-Z ]+$')]
, Calibre seems to respond to things in the form
Code:
//h:p[re:test(., '[A-Z]+')]
Yes, I was also surprised by the XPath syntax. I believe it is Calibre specific thing, Kovid may correct me if I am wrong. But it was very simple to get used to it.

Quote:
Originally Posted by tram View Post
re:test seems to behave differently than matches. Is there any documentation on re:test, or anything anyone can tell me that might enhance my googling abilities?

The regular espression I'm trying to use that works with matches but not with re:test follows:

Code:
/h:p[re:test(., '^[A-Z]+ [A-Z ]+$')]
This should work fine (with two slashes at the beginning of course). I am using regexps like this regularly in Calibre. Are you sure the text is in the p tag and not for instance wrapped in div or some other tag or is not divided between more tags AFTER calibre html processing? You can find out by checking the debug option and looking to the chosen debug directory with HTML file Calibre is really processing.
janvanmaar is offline   Reply With Quote
Old 11-26-2010, 03:05 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The h: prefix is not calibre specific, its XHTML specific. It indicates you are looking for the <p> tag in the XHTML name space. This is neccessary because the conversion pipeline internall converts everything to XHTML in the intermediate stage.

As far as I amaware calibre XPath implementation is bog standard and its regex implementation uses pythons regex implementation.
kovidgoyal is offline   Reply With Quote
Old 11-26-2010, 03:17 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by janvanmaar View Post
This should work fine (with two slashes at the beginning of course). I am using regexps like this regularly in Calibre. Are you sure the text is in the p tag and not for instance wrapped in div or some other tag
I use expressions like that, too, and like you, I think his problem has nothing to do with match, search or test differences. It's probably a simple case of not matching the test regex because there's some difference between the string regex sees in test and the string he's running match on.
Starson17 is offline   Reply With Quote
Old 11-26-2010, 03:18 PM   #10
janvanmaar
Addict
janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.
 
Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
Thanks for correction and explanation, Kovid, it makes perfect sense now.
janvanmaar is offline   Reply With Quote
Old 11-26-2010, 03:43 PM   #11
tram
Junior Member
tram began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2010
Device: Kindle
Hey Kovid, could you tell us which xpath/dom library you're using? I'm lazy and I don't want to check out the source, and I'd like to replicate what I'm seeing in the calibre TOC interface.
tram is offline   Reply With Quote
Old 11-26-2010, 04:14 PM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
lxml .
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
xpath for chapter detection romnempire Calibre 7 07-26-2010 05:34 PM
Xpath TOC Expression Agama Calibre 2 07-12-2010 02:24 AM
Seriously thoughtful What reader has the best epub implementation Greg Anos Lounge 3 12-07-2009 03:59 AM
Bug in Adobe EPUB implementation Lord KiRon ePub 3 09-05-2009 09:06 AM
html2epub TOC and chapter detection help ilovejedd Calibre 6 02-22-2009 05:58 PM


All times are GMT -4. The time now is 05:25 PM.


MobileRead.com is a privately owned, operated and funded community.