View Single Post
Old 09-07-2010, 08:12 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
try this:
<br>\s*(title1|title2|title3|title\s+with\s+spaces )\s*<br> - changing title to whatever your chapters are.

You can also add <hr>\s* or \s*<hr> to the beginning or end (depending on whether it's header or footer), to more accurately tie it just to the page headers. If you tie it to the <hr> tag you might be able to get away with something like this:

<br>\s*(\w+\s*)+\s*<br>\s*<hr>

Use the test function with some of those examples to see if you can get what you need.

http://www.regular-expressions.info/ is the best place to read up on how to use regex.

edit - here's a sample regex I used for a file which also had chapter title headers:
Code:
((Castello\s|The\s(Phleg|nun|night|prince\sof\smus|garden|secret\spalac)|Epilogu|Prefac|Four\scarnival|Amalf|La\sSiren|Marriage\sto|Montevergin|Spaccanapol|A\sstiletto|Gesualdo\sC)[^<]+<br>\s*)?(\d|[xvi])+<br>\s*(The\sD\s*e\s*v\s*i\s*l\s*[^<]+<br>)?\s*((Bh|27)[^<]+<br>\s*){4,4}\s*<hr>\s*<A name=\d+></a>
I believe in this case it was a footer, <A name=\d+></a> also shows up on every page break, so it's another way to tie the regex to the header/footer by including that in the pattern.

Last edited by ldolse; 09-07-2010 at 08:29 AM.
ldolse is offline   Reply With Quote