View Single Post
Old 10-16-2010, 11:17 AM   #3
BookGnome
Voracious Reader
BookGnome is on a distinguished road
 
BookGnome's Avatar
 
Posts: 4
Karma: 62
Join Date: Sep 2010
Device: Kindle
Finding chapters with a simple regex

Quote:
Originally Posted by smartmart View Post
So i've saved the debug and i've seen that the chapters are not in a html tag (the text anyway is child of the body tag ofcourse):
<p> foo foo foo</p>CHAPTER 1<p> foo foo foo </p>
I'm not sure how you need to specify it with Calibre's custom syntax, but your regex itself is flawed. Here's a working regex in Python:

Code:
>>> import re
>>> myString = '<p> foo foo foo</p>CHAPTER 1<p> foo foo foo </p>'
>>> re.findall('Chapter \d+', myString, re.I)
['CHAPTER 1']
A lot depends on how consistent the input file is, but this should catch any instance of the word 'chapter' followed by one or more numbers, without regard to case. How to wrap that in Calibre's regex DSL is a question for the Calibre gurus.
BookGnome is offline   Reply With Quote