View Single Post
Old 03-02-2010, 05:42 AM   #1
DarkKipper
Junior Member
DarkKipper began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2010
Location: London
Device: iPhone
Structure Detection - Remove Header (or Footer) Regex

Is there any good way of referencing variables like the title of the book in the regular expression?

I've noticed a lot of books, particularly if converted from PDF, have the book title in the header of every page, interfering with the flow of the text, like

title</p><p>

I have quite a good regex set up to remove the common file path footer, page numbers alone on a line, and traces of the abbyy and amber abc converters, and it would be nice to automatically remove a repeated title. I know I can always manually add the actual string for a specific conversion, but it'd be great to do it automatically.

Any thoughts?
DarkKipper is offline   Reply With Quote