![]() |
#1 |
Member
![]() Posts: 14
Karma: 10
Join Date: Mar 2010
Location: Switzerland
Device: Kobo Clara HD, Kobo Aura H2O, Sony PRS-300, FBReader
|
Help with a regex
I would like to remove a footer that in the HTML intermediate output looks like this:
<b>1</b></p><p>FOOTER - CHAPTER, PARAGRAPH AND PAGE</p><p> As you can see, after the "<b>" tag there is the actual page number. Then after the first "<p>" there is a text which is the same (FOOTER) and after the "-" a text that changes (CHAPTER, PARAGRAPH AND PAGE), which prevents me from doing an easy "Remove All" command. The input is a PDF file, the output can be an EPUB or whatever. I have given a look at the regex documentation, but, as I have never done anything like that, that is a too steep mountain to climb for a starter. Basically I would like to have a regex that tells calibre to remove an expression that starts with: <b> that ends with: </p><p> that contains between "<b>" and "</b>" a number that changes at every page that contains "</b></p><p>FOOTER -" and that contains between "-" and "</p><p>" something variable (text and numbers) whatever it is. It should be just removed. Not replaced by anything. If calibre cannot do it, is there a way to do it with a script running either on a Windows or Linux computer? Thanks to those who will help me or at least put me in the right direction. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
(?s)<b>\d+</b>.*?FOOTER.*?</p>
|
![]() |
![]() |
Advert | |
|
![]() |
Tags |
regex |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
What's wrong with this regex? | crutledge | Sigil | 1 | 05-11-2010 01:49 PM |
What a regex is | Worldwalker | Calibre | 20 | 05-10-2010 05:51 AM |
Multiline Regex? | prky | Calibre | 25 | 05-01-2010 09:56 PM |
help with regex expression | daesdaemar | Workshop | 4 | 02-19-2010 07:38 AM |
Regex help... | Bobthebass | Workshop | 6 | 04-26-2009 03:54 PM |