Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-15-2011, 10:11 AM   #1
adad
Junior Member
adad began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2011
Device: kindle
Regexp and Alternate Page Header/Footer

I am new to ebooks and Calibre. I am only starting to understand some of the great things it can do. I am still on the simple parts. I do have some Unix background -- but more the grep error than python. Clearly not used to the message board formatting yet either.

I am stuck on the regexp to remove the alternating headers. I did more or less dump elegance and went to brute force as I don't understand [I actually have the title and the author manually typed in]. The original PDF has different header on odd and even pages -- information in top corners only.

Quote:

1. On the PDF Page the header is:

TITLE [white space] [page no] - odd pages
[page no] [white space] AUTHOR - even pages


2. On the Calibre display page in "Structure Detection"
for the odd pages
<hr>
<A name=7></a>TITLE <br>
3 <br>

for the even pages
<hr>
<A name=6></a>2 <br>
AUTHOR <br>

3. Header regular expression
(?im)(<hr>((\s*<a name=\d+></a>\s*\d+\s*<br>\s*AUTHOR\s*<br>)|(\s*<a name=\d+></a>TITLE\s*<br>\s*\d+\s*<br>)))

On the display, what I want to delete is highlighted in yellow when I test.
I have the delete header box checked but when I run the conversion (to MOBI) I get effectively the page numbers inserted in place of the header. Is the string returning some sort of value or match number, and how do I stop it from inserting?
adad is offline   Reply With Quote
Old 01-15-2011, 10:51 AM   #2
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
I am guessing that your regex is incomplete and the headers actually have the page number there as well - so you are deleting all except the page number.

I always use the wizard button next to the regex field so that I can check that I have matched sufficient text.
itimpi is offline   Reply With Quote
Advert
Old 01-15-2011, 10:53 AM   #3
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
From what I see, that shouldn't be happening. Did you make sure the page number is highlighted as well?
You could also try to use the header removal for one type of the headers and the footer removal for the other. There may be some funky effect in that regex that I failed to notice.
Manichean is offline   Reply With Quote
Old 01-15-2011, 08:40 PM   #4
adad
Junior Member
adad began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2011
Device: kindle
No, definitely used the wizard -- I am glad I found it. The page numbers are definitely with the yellow background.

It is of any help, if I go back to MOBI version it has translated to:
[quote]
<p class="calibre_11">2 </p><p class="calibre_11">
</p><p class="calibre_11">
[\quote]
-- where two is the page number, and
[quote]<p class="calibre_11">
</p><p class="calibre_11">3 </p>
[\quote]
when three is the page number.

They are reflexive, but actually do look the same in the kindle reader.


Quote:
Originally Posted by itimpi View Post
I am guessing that your regex is incomplete and the headers actually have the page number there as well - so you are deleting all except the page number.

I always use the wizard button next to the regex field so that I can check that I have matched sufficient text.
adad is offline   Reply With Quote
Old 01-15-2011, 08:51 PM   #5
adad
Junior Member
adad began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2011
Device: kindle
Page numbers are included in the highlighted yellow.

Tried the footnote and header thing. I took out the (?im) and then could drop all the brackets. With a little care on the case, everything highlighted in the wizard (for the appropriate footer and header) but the result was the same.

The eliminated most of the rules I was not familiar with -- and that it is a pretty straightforward regexp. Shame I cannot use grep.
adad is offline   Reply With Quote
Advert
Old 01-15-2011, 09:03 PM   #6
adad
Junior Member
adad began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2011
Device: kindle
Thank you for the suggestion. I tried it and it did not work -- but you got me to thinking. I eventually ended up deleting the old MOBI file and then it did work.

I looked at some of my other conversions, and it may be that the Calipre is resetting the input file to the old Mobi file if it exists. When I used the wizard, I had a choice and chose the PDF file. Perhaps it was actually running the regexp against the old Mobi file, not the PDF file. Something else I will have to see in future efforts.

Thank you both for the assistance.


Quote:
Originally Posted by Manichean View Post
From what I see, that shouldn't be happening. Did you make sure the page number is highlighted as well?
You could also try to use the header removal for one type of the headers and the footer removal for the other. There may be some funky effect in that regex that I failed to notice.
adad is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Add Header/Footer hrwriter Calibre 3 12-08-2010 05:11 AM
Removing header and footer radicalnomad Calibre 2 08-26-2010 10:34 AM
Cropping a header and footer from a PDF (Page numbers etc) NickS PDF 2 06-09-2010 11:31 AM
Header/Footer removal Solicitous Calibre 2 03-30-2010 05:53 AM
Regexp and header/footer problems concern Calibre 0 02-07-2010 03:35 AM


All times are GMT -4. The time now is 10:52 PM.


MobileRead.com is a privately owned, operated and funded community.