MobileRead Forums - View Single Post

adad · 01-15-2011, 11:11 AM

I am new to ebooks and Calibre. I am only starting to understand some of the great things it can do. I am still on the simple parts. I do have some Unix background -- but more the grep error than python. Clearly not used to the message board formatting yet either.

I am stuck on the regexp to remove the alternating headers. I did more or less dump elegance and went to brute force as I don't understand [I actually have the title and the author manually typed in]. The original PDF has different header on odd and even pages -- information in top corners only.

Quote:

1. On the PDF Page the header is:

TITLE [white space] [page no] - odd pages
[page no] [white space] AUTHOR - even pages

2. On the Calibre display page in "Structure Detection"
for the odd pages
<hr>
<A name=7></a>TITLE 
3 

for the even pages
<hr>
<A name=6></a>2 
AUTHOR 

3. Header regular expression
(?im)(<hr>((\s*<a name=\d+></a>\s*\d+\s* \s*AUTHOR\s* )|(\s*<a name=\d+></a>TITLE\s* \s*\d+\s* )))

On the display, what I want to delete is highlighted in yellow when I test.
I have the delete header box checked but when I run the conversion (to MOBI) I get effectively the page numbers inserted in place of the header. Is the string returning some sort of value or match number, and how do I stop it from inserting?