MobileRead Forums - View Single Post

Manichean · 11-02-2010, 11:50 AM

Quote:

Originally Posted by janvanmaar

I think my suspicion was correct, the XHTML looks like this:

Code:

<p class="P-kapit">2.</p>                                                                                                       
<p class="P-P32">Name of chapter</p>

So it seems that I cannot grep over multiple lines directly as they are not passed together to the TOC creation engine.

I've seen multiline matching working before, so I'm guessing that the creation engine sees the whole source at once. However, the problem here is that the regex (or XPath, which is what you'd have to use for TOC creation) doesn't match the source because of the tags present. I don't know XPath as well as I do regexes, but I'm guessing that

Code:

//h:p[re.test(., "[0-9]+\.",)]//h:p[re.test(., "[a-z ]+", "i")]

should do the trick.

I don't know, however, if the matching works at all for multiple tags. Might be worth a try, though.