View Single Post
Old 01-11-2013, 09:01 AM   #1
sws began at the beginning.
sws's Avatar
Posts: 32
Karma: 10
Join Date: Oct 2012
Device: Pocket Book Touch Lux
python regex: delete text in preprocessing

Hi all,

I am using a calibre recipe (Weltonline; german daily newspaper) to fetch news daily. Everything works fine, but in the final epub file after every news article there is plenty of (web-) rubbish I want to get rid of. Therefore I use Sigil and work on the epub I have downloaded from my calibre server. I delete everything between two string groups:

(     <div class="calibre7">
                   Axel Springer AG 2013. Alle Rechte vorbehalten)([\s\S ]*?)(Weitere Hinweise</a></li>)
Now I have seen that there is a way to preprocess regex expressions while fetching the news with calibre. But I have also seen that python requires a slightly different approach towards regex and I am no expert in different regex dialects nor am I a python programmer.Took me quite a while to figure out the regex above .

Could someone tell me how to use the above regex in the forementioned recipe in a way to use this preprocessing?

sws is offline   Reply With Quote