Structure Detection - Remove Header (or Footer) Regex
Is there any good way of referencing variables like the title of the book in the regular expression?
I've noticed a lot of books, particularly if converted from PDF, have the book title in the header of every page, interfering with the flow of the text, like
title</p><p>
I have quite a good regex set up to remove the common file path footer, page numbers alone on a line, and traces of the abbyy and amber abc converters, and it would be nice to automatically remove a repeated title. I know I can always manually add the actual string for a specific conversion, but it'd be great to do it automatically.
Any thoughts?
|