![]() |
#1 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Nov 2010
Device: Kindle
|
Calibre's XPath implementation for TOC detection
I'm converting a few books to go on my kindle, and I'm trying to get the chapters set up the way I want. One thing I'm noticing is that Calibre seems to use a slightly different flavor of xpath than what I'm used to. Where I would write
Code:
//p[matches(., '^[A-Z ]+$')] Code:
//h:p[re:test(., '[A-Z]+')] The regular espression I'm trying to use that works with matches but not with re:test follows: Code:
/h:p[re:test(., '^[A-Z]+ [A-Z ]+$')] |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
There's documentation on XPath in the manual. The behaviour of the regular expressions and the re.test function should be documented in the python documentation.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
re:test() re:test(src, pattern, flags) returns true if the string src matches the regular expression pattern. |
|
![]() |
![]() |
![]() |
#4 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Nov 2010
Device: Kindle
|
I have an XPath visualization tool that I've been using to test expressions, querying against the HTML file that gets left in the debug 'processed' directory. Using that tool, I pull out the elements I'm looking for, but I use matches() instead of re:test. The tool isn't written in python, so that's probably the difference.
I can check the manual, but is there a standard method for testing xpath queries aside from the generate-and-view method? I feel like the regular expression I'm trying to use isn't particularly complicated, and that it should work just fine. |
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
I don't know if XPath comes in several flavors as well, but for regexes, your test tool needs to support the python flavor. Personally, I use python to test expressions, though that may be a bit overkill if you don't actually use it as a language...
|
![]() |
![]() |
Advert | |
|
![]() |
#6 | ||
reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,977
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
|
The XPath Tutorial says:
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#7 | ||
Addict
![]() ![]() ![]() ![]() ![]() Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#8 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The h: prefix is not calibre specific, its XHTML specific. It indicates you are looking for the <p> tag in the XHTML name space. This is neccessary because the conversion pipeline internall converts everything to XHTML in the intermediate stage.
As far as I amaware calibre XPath implementation is bog standard and its regex implementation uses pythons regex implementation. |
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I use expressions like that, too, and like you, I think his problem has nothing to do with match, search or test differences. It's probably a simple case of not matching the test regex because there's some difference between the string regex sees in test and the string he's running match on.
|
![]() |
![]() |
![]() |
#10 |
Addict
![]() ![]() ![]() ![]() ![]() Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
|
Thanks for correction and explanation, Kovid, it makes perfect sense now.
|
![]() |
![]() |
![]() |
#11 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Nov 2010
Device: Kindle
|
Hey Kovid, could you tell us which xpath/dom library you're using? I'm lazy and I don't want to check out the source, and I'd like to replicate what I'm seeing in the calibre TOC interface.
|
![]() |
![]() |
![]() |
#12 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,349
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
lxml .
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
xpath for chapter detection | romnempire | Calibre | 7 | 07-26-2010 05:34 PM |
Xpath TOC Expression | Agama | Calibre | 2 | 07-12-2010 02:24 AM |
Seriously thoughtful What reader has the best epub implementation | Greg Anos | Lounge | 3 | 12-07-2009 03:59 AM |
Bug in Adobe EPUB implementation | Lord KiRon | ePub | 3 | 09-05-2009 09:06 AM |
html2epub TOC and chapter detection help | ilovejedd | Calibre | 6 | 02-22-2009 05:58 PM |