View Single Post
Old 01-15-2011, 10:11 AM   #1
adad
Junior Member
adad began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2011
Device: kindle
Regexp and Alternate Page Header/Footer

I am new to ebooks and Calibre. I am only starting to understand some of the great things it can do. I am still on the simple parts. I do have some Unix background -- but more the grep error than python. Clearly not used to the message board formatting yet either.

I am stuck on the regexp to remove the alternating headers. I did more or less dump elegance and went to brute force as I don't understand [I actually have the title and the author manually typed in]. The original PDF has different header on odd and even pages -- information in top corners only.

Quote:

1. On the PDF Page the header is:

TITLE [white space] [page no] - odd pages
[page no] [white space] AUTHOR - even pages


2. On the Calibre display page in "Structure Detection"
for the odd pages
<hr>
<A name=7></a>TITLE <br>
3 <br>

for the even pages
<hr>
<A name=6></a>2 <br>
AUTHOR <br>

3. Header regular expression
(?im)(<hr>((\s*<a name=\d+></a>\s*\d+\s*<br>\s*AUTHOR\s*<br>)|(\s*<a name=\d+></a>TITLE\s*<br>\s*\d+\s*<br>)))

On the display, what I want to delete is highlighted in yellow when I test.
I have the delete header box checked but when I run the conversion (to MOBI) I get effectively the page numbers inserted in place of the header. Is the string returning some sort of value or match number, and how do I stop it from inserting?
adad is offline   Reply With Quote