|
|
#1 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Nov 2010
Device: Kindle
|
Calibre's XPath implementation for TOC detection
I'm converting a few books to go on my kindle, and I'm trying to get the chapters set up the way I want. One thing I'm noticing is that Calibre seems to use a slightly different flavor of xpath than what I'm used to. Where I would write
Code:
//p[matches(., '^[A-Z ]+$')] Code:
//h:p[re:test(., '[A-Z]+')] The regular espression I'm trying to use that works with matches but not with re:test follows: Code:
/h:p[re:test(., '^[A-Z]+ [A-Z ]+$')] |
|
|
|
|
|
#2 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
There's documentation on XPath in the manual. The behaviour of the regular expressions and the re.test function should be documented in the python documentation.
|
|
|
|
| Advert | |
|
|
|
|
#3 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
re:test() re:test(src, pattern, flags) returns true if the string src matches the regular expression pattern. |
|
|
|
|
|
|
#4 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Nov 2010
Device: Kindle
|
I have an XPath visualization tool that I've been using to test expressions, querying against the HTML file that gets left in the debug 'processed' directory. Using that tool, I pull out the elements I'm looking for, but I use matches() instead of re:test. The tool isn't written in python, so that's probably the difference.
I can check the manual, but is there a standard method for testing xpath queries aside from the generate-and-view method? I feel like the regular expression I'm trying to use isn't particularly complicated, and that it should work just fine. |
|
|
|
|
|
#5 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
I don't know if XPath comes in several flavors as well, but for regexes, your test tool needs to support the python flavor. Personally, I use python to test expressions, though that may be a bit overkill if you don't actually use it as a language...
|
|
|
|
| Advert | |
|
|
|
|
#6 | ||
|
reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,977
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
|
The XPath Tutorial says:
Quote:
Quote:
|
||
|
|
|
|
|
#7 | ||
|
Addict
![]() ![]() ![]() ![]() ![]() Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
|
Quote:
Quote:
|
||
|
|
|
|
|
#8 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The h: prefix is not calibre specific, its XHTML specific. It indicates you are looking for the <p> tag in the XHTML name space. This is neccessary because the conversion pipeline internall converts everything to XHTML in the intermediate stage.
As far as I amaware calibre XPath implementation is bog standard and its regex implementation uses pythons regex implementation. |
|
|
|
|
|
#9 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I use expressions like that, too, and like you, I think his problem has nothing to do with match, search or test differences. It's probably a simple case of not matching the test regex because there's some difference between the string regex sees in test and the string he's running match on.
|
|
|
|
|
|
#10 |
|
Addict
![]() ![]() ![]() ![]() ![]() Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
|
Thanks for correction and explanation, Kovid, it makes perfect sense now.
|
|
|
|
|
|
#11 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Nov 2010
Device: Kindle
|
Hey Kovid, could you tell us which xpath/dom library you're using? I'm lazy and I don't want to check out the source, and I'd like to replicate what I'm seeing in the calibre TOC interface.
|
|
|
|
|
|
#12 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
lxml .
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| xpath for chapter detection | romnempire | Calibre | 7 | 07-26-2010 06:34 PM |
| Xpath TOC Expression | Agama | Calibre | 2 | 07-12-2010 03:24 AM |
| Seriously thoughtful What reader has the best epub implementation | Greg Anos | Lounge | 3 | 12-07-2009 04:59 AM |
| Bug in Adobe EPUB implementation | Lord KiRon | ePub | 3 | 09-05-2009 10:06 AM |
| html2epub TOC and chapter detection help | ilovejedd | Calibre | 6 | 02-22-2009 06:58 PM |