Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-05-2010, 05:13 AM   #1
A.T.E.
Member
A.T.E. began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Mar 2010
Location: Switzerland
Device: Kobo Clara HD, Kobo Aura H2O, Sony PRS-300, FBReader
Help with a regex

I would like to remove a footer that in the HTML intermediate output looks like this:

<b>1</b></p><p>FOOTER - CHAPTER, PARAGRAPH AND PAGE</p><p>

As you can see, after the "<b>" tag there is the actual page number. Then after the first "<p>" there is a text which is the same (FOOTER) and after the "-" a text that changes (CHAPTER, PARAGRAPH AND PAGE), which prevents me from doing an easy "Remove All" command.

The input is a PDF file, the output can be an EPUB or whatever.

I have given a look at the regex documentation, but, as I have never done anything like that, that is a too steep mountain to climb for a starter.

Basically I would like to have a regex that tells calibre to remove an expression that starts with:

<b>

that ends with:

</p><p>

that contains between "<b>" and "</b>" a number that changes at every page

that contains "</b></p><p>FOOTER -"

and that contains between "-" and "</p><p>" something variable (text and numbers) whatever it is.

It should be just removed. Not replaced by anything.

If calibre cannot do it, is there a way to do it with a script running either on a Windows or Linux computer?

Thanks to those who will help me or at least put me in the right direction.
A.T.E. is offline   Reply With Quote
Old 04-05-2010, 07:50 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,826
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
(?s)<b>\d+</b>.*?FOOTER.*?</p>
kovidgoyal is offline   Reply With Quote
Advert
Reply

Tags
regex

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What's wrong with this regex? crutledge Sigil 1 05-11-2010 01:49 PM
What a regex is Worldwalker Calibre 20 05-10-2010 05:51 AM
Multiline Regex? prky Calibre 25 05-01-2010 09:56 PM
help with regex expression daesdaemar Workshop 4 02-19-2010 07:38 AM
Regex help... Bobthebass Workshop 6 04-26-2009 03:54 PM


All times are GMT -4. The time now is 05:04 AM.


MobileRead.com is a privately owned, operated and funded community.