Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-19-2010, 11:43 AM   #1
MacEvansCB
Enthusiast
MacEvansCB began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Nov 2010
Location: Somewhere in Iowa
Device: Nook Color
Need help with a footer

I have been able to remove part of a footer during conversion. The regex builder shows me:
Code:
<b>Page  1</b><br>
which works when using:
Code:
<b>Page  \d+</b><br>
But this leaves an blank line in the text for every removal.
The full header actually is:
Code:
<b>Page  1</b><br>
<hr>
which is on two lines, where the HR leaves a blank line in the converted file.
I can get regex to see either line correctly, which will remove either the page number or the blank line.
But I have no clue how to get both lines recognized together.
Can someone please give me the correct incantation to link these two lines????
MacEvansCB is offline   Reply With Quote
Old 12-19-2010, 12:06 PM   #2
MacEvansCB
Enthusiast
MacEvansCB began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Nov 2010
Location: Somewhere in Iowa
Device: Nook Color
Jeez ... found my own answer .... looking at the defaults, I found a vertical bar "pipe" character and the old UNIX pile in my head woke up ... tried:
Code:
<b>Page  1</b><br>|<hr>
which worked perfectly.
MacEvansCB is offline   Reply With Quote
Old 12-19-2010, 12:56 PM   #3
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
That's one way to do it, the regex will then match either the subexpression on the left side of the pipe or the one on the right side. You could also use something like
Code:
<b>Page\s+\d+</b><br>\s+<hr>
which will match the whitespaces that make up the linebreak. Another way to do it would be
Code:
(?s)<b>Page\s+\d+</b><br>.+?<hr>
which uses a flag to tell the dot wildcard to match anything including newlines and then uses a dot to match the newline.
Manichean is offline   Reply With Quote
Old 04-10-2011, 10:48 PM   #4
MacEvansCB
Enthusiast
MacEvansCB began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Nov 2010
Location: Somewhere in Iowa
Device: Nook Color
What if a line starts with the page number??

I've been working with my eBooks in Caliber for several months now without any problems, but I just ran into something I can't figure out.

The following is a page break for one of my file ... ending and starting text included:
Code:
nervously  with  the  other  specialists  stationed  at  the <br>
1 <br>
 <br>
<hr>
<A name=6></a>end of the runway, waiting for the bombing mission to <br>
So I would normally do:
Code:
+\d+ <br>|<hr>|<A name=+\d+></a>
But the parser won't take an expression starting with a plus sign.
What does one do in this situation???
MacEvansCB is offline   Reply With Quote
Old 04-10-2011, 11:26 PM   #5
atjnjk
Zealot
atjnjk will become famous soon enoughatjnjk will become famous soon enoughatjnjk will become famous soon enoughatjnjk will become famous soon enoughatjnjk will become famous soon enoughatjnjk will become famous soon enough
 
Posts: 105
Karma: 554
Join Date: Oct 2008
Device: none
Normally, I select&copy everything I want to remove to "Regex:" textbox
Code:
<br>
1 <br>
 <br>
<hr>
<A name=6></a>
Then I replace every number in that textbox with "\d+"
Code:
<br>
\d+ <br>
 <br>
<hr>
<A name=\d+></a>
Be careful with whitespaces.

I think a standalone "+" doesn't do anything and is an error. You should read An introduction to regular expressions and All about using regular expressions in calibre.

Last edited by atjnjk; 04-10-2011 at 11:36 PM.
atjnjk is offline   Reply With Quote
Old 04-11-2011, 11:07 AM   #6
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,473
Karma: 1053245
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by MacEvansCB View Post
But the parser won't take an expression starting with a plus sign.
What does one do in this situation???
The plus sign has special meaning. It means match one or more of of the expression / character before. There is nothin before so it's an invalid expression. If you want to match a plus sign character you need to escape it. Escaping tells the parser to treat it as the character itself.
user_none is offline   Reply With Quote
Old 04-12-2011, 02:48 PM   #7
Tharadalf
Junior Member
Tharadalf began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Kindle
Hello everyone,
I am also trying to remove page numbers, which in text look like this:
Code:
3 <br>
So I came up with this regexp:
Code:
^\d+ <br>$
but it is not working for me. What am I doing wrong?
Tharadalf is offline   Reply With Quote
Old 04-12-2011, 02:57 PM   #8
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,473
Karma: 1053245
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Tharadalf View Post
Hello everyone,
I am also trying to remove page numbers, which in text look like this:
Code:
3 <br>
So I came up with this regexp:
Code:
^\d+ <br>$
but it is not working for me. What am I doing wrong?
1) use (?mu)before ^ to enable multple line matching. otherwise ^ matches the start of the string.

2) Use \s instead of a space to match new line characters. Also add + to match multiple spaces.
user_none is offline   Reply With Quote
Old 04-12-2011, 06:41 PM   #9
Tharadalf
Junior Member
Tharadalf began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Kindle
Quote:
Originally Posted by user_none View Post
1) use (?mu)before ^ to enable multple line matching. otherwise ^ matches the start of the string.

2) Use \s instead of a space to match new line characters. Also add + to match multiple spaces.
Yeah, this worked just fine, thank you man.
Tharadalf is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Footer removal WilliamDio Calibre 2 11-26-2010 12:12 PM
footer removal help icy Calibre 7 08-27-2010 02:21 PM
Removing header and footer radicalnomad Calibre 2 08-26-2010 11:34 AM
Remove Footer cdecaf Calibre 44 07-21-2010 06:48 AM
BookDesigner - Footer font? moneytoo Sony Reader 1 10-22-2007 12:31 PM


All times are GMT -4. The time now is 11:07 PM.


MobileRead.com is a privately owned, operated and funded community.