View Single Post
Old 02-03-2010, 08:04 AM   #1339
TBR
Junior Member
TBR began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2010
Device: Sony PRS-505, Asus eeePC 1000H
I'm still having trouble to get a recipe for

http://p.yimg.com/bw/rss/nachrichten/bundeswehr.xml

cleared of unnecessary clutter, am still getting artifacts.
The modified basic news recipe works in principle and removes much of the clutter but still includes, among others, a "ghost" of an add:
Quote:
class AdvancedUserRecipe1264591440(BasicNewsRecipe):
title = u'Bundeswehr'
oldest_article = 7
max_articles_per_feed = 100
remove_tags_after = dict(name='div', attrs={'id':'content'})
remove_tags_before = dict(name='div', attrs={'id':'content'})
feeds = [(u'Bundeswehr in AFP und AP', u'http://p.yimg.com/bw/rss/nachrichten/bundeswehr.xml')]
Could anyone jump in with advice?

I want to get a "filtered" recipe going to scan several rss-feeds and filter out all articles that don't contain certain keywords so that only news items that do contain those keywords are included in the created e-book, thus creating an instant press review on a certain theme/person/event etc. Kovidgoyal has confirmed the possibility of doing this with calibre:
Quote:
Originally Posted by kovidgoyal View Post
If you've seen http://bazaar.launchpad.net/~kovid/c.../feeds/news.py

there's not much more I can tell you. Basically, you can completely customize the news download process by overring the methods of that class. So if you want to create a compsite recipe you would create a parse_index method that will list all the current articles in your various news sources. Then you would override postprocess_html to check for the required keywords and if absent return None
but I'm afraid that this is currently beyond my programming/scripting skills. As this would be a rather extensive recipe I'm hesitant to simply request it in this forum but could someone post a recipe with a keyword filter so I can learn from the example?
TBR is offline