View Full Version : Need help with a footer


MacEvansCB
12-19-2010, 10:43 AM
I have been able to remove part of a footer during conversion. The regex builder shows me:
<b>Page 1</b><br>
which works when using:
<b>Page \d+</b><br>
But this leaves an blank line in the text for every removal.
The full header actually is:
<b>Page 1</b><br>
<hr>
which is on two lines, where the HR leaves a blank line in the converted file.
I can get regex to see either line correctly, which will remove either the page number or the blank line.
But I have no clue how to get both lines recognized together.
Can someone please give me the correct incantation to link these two lines????

MacEvansCB
12-19-2010, 11:06 AM
Jeez ... found my own answer .... looking at the defaults, I found a vertical bar "pipe" character and the old UNIX pile in my head woke up ... tried:
<b>Page 1</b><br>|<hr>
which worked perfectly.

Manichean
12-19-2010, 11:56 AM
That's one way to do it, the regex will then match either the subexpression on the left side of the pipe or the one on the right side. You could also use something like <b>Page\s+\d+</b><br>\s+<hr> which will match the whitespaces that make up the linebreak. Another way to do it would be (?s)<b>Page\s+\d+</b><br>.+?<hr> which uses a flag to tell the dot wildcard to match anything including newlines and then uses a dot to match the newline.

MacEvansCB
04-10-2011, 09:48 PM
I've been working with my eBooks in Caliber for several months now without any problems, but I just ran into something I can't figure out.

The following is a page break for one of my file ... ending and starting text included:
nervously with the other specialists stationed at the <br>
1 <br>
<br>
<hr>
<A name=6></a>end of the runway, waiting for the bombing mission to <br>
So I would normally do:
+\d+ <br>|<hr>|<A name=+\d+></a>
But the parser won't take an expression starting with a plus sign.
What does one do in this situation???

atjnjk
04-10-2011, 10:26 PM
Normally, I select&copy everything I want to remove to "Regex:" textbox <br>
1 <br>
<br>
<hr>
<A name=6></a>
Then I replace every number in that textbox with "\d+" <br>
\d+ <br>
<br>
<hr>
<A name=\d+></a>
Be careful with whitespaces.

I think a standalone "+" doesn't do anything and is an error. You should read An introduction to regular expressions (http://www.mobileread.com/forums/showthread.php?t=118569) and All about using regular expressions in calibre (http://calibre-ebook.com/user_manual/regexp.html).

user_none
04-11-2011, 10:07 AM
But the parser won't take an expression starting with a plus sign.
What does one do in this situation???

The plus sign has special meaning. It means match one or more of of the expression / character before. There is nothin before so it's an invalid expression. If you want to match a plus sign character you need to escape it. Escaping tells the parser to treat it as the character itself.

Tharadalf
04-12-2011, 01:48 PM
Hello everyone,
I am also trying to remove page numbers, which in text look like this:

3 <br>

So I came up with this regexp:

^\d+ <br>$

but it is not working for me. What am I doing wrong?:blink:

user_none
04-12-2011, 01:57 PM
Hello everyone,
I am also trying to remove page numbers, which in text look like this:

3 <br>

So I came up with this regexp:

^\d+ <br>$

but it is not working for me. What am I doing wrong?:blink:

1) use (?mu)before ^ to enable multple line matching. otherwise ^ matches the start of the string.

2) Use \s instead of a space to match new line characters. Also add + to match multiple spaces.

Tharadalf
04-12-2011, 05:41 PM
1) use (?mu)before ^ to enable multple line matching. otherwise ^ matches the start of the string.

2) Use \s instead of a space to match new line characters. Also add + to match multiple spaces.

Yeah, this worked just fine, thank you man.