MobileRead Forums - View Single Post

Olger · 12-13-2011, 10:35 PM

Hi There,
I'm still new to Calibre, and haven't used Python before (though plenty other languages). I'm slowly getting the nack of it, but need some pointers with some things.

I've got a recipe for time.com RSS feeds, and that all works nice. However, time has the annoying habit of inserting a little "advertisement" in their RSS feeds. Every Ad starts with "MORE:" and terminates with the .
Of course, they use non-classed tags elsewhere so simply removing all tags won't work. I figured postprocess_html is the way to go, then parse the 'soup' for tags and remove the ones that contain "MORE:".
But, that just leaves me creating the code... Anyone able to give some pointers?

Cheers! Olger.

12-13-2011, 10:35 PM	#1
Olger Member Posts: 11 Karma: 10 Join Date: Nov 2011 Device: Kobo Touch	Cleaning up tags based on content Hi There, I'm still new to Calibre, and haven't used Python before (though plenty other languages). I'm slowly getting the nack of it, but need some pointers with some things. I've got a recipe for time.com RSS feeds, and that all works nice. However, time has the annoying habit of inserting a little "advertisement" in their RSS feeds. Every Ad starts with "<p><strong>MORE:</strong>" and terminates with the </p>. Of course, they use non-classed <p> tags elsewhere so simply removing all <p> tags won't work. I figured postprocess_html is the way to go, then parse the 'soup' for <p> tags and remove the ones that contain "<strong>MORE:</strong>". But, that just leaves me creating the code... Anyone able to give some pointers? Cheers! Olger.