View Single Post
Old 09-21-2011, 11:44 PM   #1
romualdinho
Junior Member
romualdinho began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Sep 2011
Location: Montevideo, Uruguay
Device: Kindle3
Duplicated news in recipe with multiple feeds

Hello everybody,

I have a question about the configuration of recipes.

There's a site that has an RSS file for each tag/topic used in the articles.
In my recipe I added some feeds for the topics i'm interested in.
The problem is an article has many tags, so it can be in two or more feeds and the article will be twice (or three times, or four...)

Is it possible to remove the duplicated articles from the recipes?

This is my code:
Code:
class AdvancedUserRecipe1316656601(BasicNewsRecipe):
    title          = u'Mongabay'
    oldest_article = 120
    max_articles_per_feed = 100
    auto_cleanup = True
    remove_tags    = [dict(name='p', attrs={'class':'hide'})]

    feeds          = [(u'Amazon', u'http://news.mongabay.com/xml/amazon1.xml'), (u'Species discovery', u'http://news.mongabay.com/xml/species_discovery1.xml'), (u'Rainforest animals', u'http://news.mongabay.com/xml/rainforest%20animals1.xml'), (u'Cats', u'http://news.mongabay.com/xml/cats1.xml'), (u'Pantanal', u'http://news.mongabay.com/xml/pantanal1.xml')]

    def print_version(self, url):
        return url.replace('http://', 'http://print.')
It's a basic recipe (yet )
An example could be: the feed titled 'Amazon' has an article that also is in 'Rainforest animals'.
What I want is to have only one of those duplicated articles. Is that possible?

Any help will be appreciated.
romualdinho is offline   Reply With Quote