|04-05-2010, 06:13 AM||#1|
Join Date: Mar 2010
Device: Kobo Aura H2O, Sony PRS-300, FBReader
Help with a regex
I would like to remove a footer that in the HTML intermediate output looks like this:
<b>1</b></p><p>FOOTER - CHAPTER, PARAGRAPH AND PAGE</p><p>
As you can see, after the "<b>" tag there is the actual page number. Then after the first "<p>" there is a text which is the same (FOOTER) and after the "-" a text that changes (CHAPTER, PARAGRAPH AND PAGE), which prevents me from doing an easy "Remove All" command.
The input is a PDF file, the output can be an EPUB or whatever.
I have given a look at the regex documentation, but, as I have never done anything like that, that is a too steep mountain to climb for a starter.
Basically I would like to have a regex that tells calibre to remove an expression that starts with:
that ends with:
that contains between "<b>" and "</b>" a number that changes at every page
that contains "</b></p><p>FOOTER -"
and that contains between "-" and "</p><p>" something variable (text and numbers) whatever it is.
It should be just removed. Not replaced by anything.
If calibre cannot do it, is there a way to do it with a script running either on a Windows or Linux computer?
Thanks to those who will help me or at least put me in the right direction.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|What's wrong with this regex?||crutledge||Sigil||1||05-11-2010 02:49 PM|
|What a regex is||Worldwalker||Calibre||20||05-10-2010 06:51 AM|
|Multiline Regex?||prky||Calibre||25||05-01-2010 10:56 PM|
|help with regex expression||daesdaemar||Workshop||4||02-19-2010 08:38 AM|
|Regex help...||Bobthebass||Workshop||6||04-26-2009 04:54 PM|