MobileRead Forums - View Single Post

BookGnome · 10-16-2010, 11:17 AM

Quote:

Originally Posted by smartmart

So i've saved the debug and i've seen that the chapters are not in a html tag (the text anyway is child of the body tag ofcourse):
<p> foo foo foo</p>CHAPTER 1<p> foo foo foo </p>

I'm not sure how you need to specify it with Calibre's custom syntax, but your regex itself is flawed. Here's a working regex in Python:

Code:

>>> import re
>>> myString = '<p> foo foo foo</p>CHAPTER 1<p> foo foo foo </p>'
>>> re.findall('Chapter \d+', myString, re.I)
['CHAPTER 1']

A lot depends on how consistent the input file is, but this should catch any instance of the word 'chapter' followed by one or more numbers, without regard to case. How to wrap that in Calibre's regex DSL is a question for the Calibre gurus.