View Single Post
Old 11-29-2010, 06:02 PM   #1
RedDogInCan
Member
RedDogInCan began at the beginning.
 
RedDogInCan's Avatar
 
Posts: 13
Karma: 10
Join Date: Nov 2010
Location: Australia
Device: Kindle DX
Recipe for ABC News (Australia)

Here is a recipe for loading news articles from the ABC News RSS feeds http://www.abc.net.au/news/feeds/rss.htm . The ABC is the independent public broadcaster here in Australia and is considered to be relatively unbiased compared to other news sources.

The recipe currently loads articles from Top Stories, the major capital cities, Australia, World, Business, and Science and Technology feeds, but the ABC also provides other feeds on a wide range of topics and geographical areas in Australia.

One problem is that they have a habit of including video only articles in the feed and I haven't thought of a way to filter them out.

Code:
__license__   = 'GPL v3'
__copyright__ = '2010, Dean Cording'
'''
abc.net.au/news
'''
import re
from calibre.web.feeds.recipes import BasicNewsRecipe

class ABCNews(BasicNewsRecipe):
    title                  = 'ABC News'
    __author__             = 'Dean Cording'
    description            = 'News from Australia'
    masthead_url           = 'http://www.abc.net.au/news/assets/v5/images/common/logo-news.png'
    cover_url              = 'http://www.abc.net.au/news/assets/v5/images/common/logo-news.png'

    oldest_article         = 2
    max_articles_per_feed  = 100
    no_stylesheets         = False
    #delay                  = 1
    use_embedded_content   = False
    encoding               = 'utf8'
    publisher              = 'ABC News'
    category               = 'News, Australia, World'
    language               = 'en_AU'
    publication_type       = 'newsportal'
    preprocess_regexps     = [(re.compile(r'<!--.*?-->', re.DOTALL), lambda m: '')]
    conversion_options = {
                             'comments'        : description
                            ,'tags'            : category
                            ,'language'        : language
                            ,'publisher'       : publisher
                            ,'linearize_tables': False
                         }

    keep_only_tags    =  dict(id='article')

    remove_tags = [dict(attrs={'class':['related', 'tags']}),
                     dict(id='statepromo')
                        ]

    remove_attributes = ['width','height']

    feeds          = [
                      ('Top Stories', 'http://www.abc.net.au/news/syndicate/topstoriesrss.xml'),
                      ('Canberra', 'http://www.abc.net.au/news/indexes/idx-act/rss.xml'),
                      ('Sydney', 'http://www.abc.net.au/news/indexes/sydney/rss.xml'),
                      ('Melbourne', 'http://www.abc.net.au/news/indexes/melbourne/rss.xml'),
                      ('Brisbane', 'http://www.abc.net.au/news/indexes/brisbane/rss.xml'),
                      ('Perth', 'http://www.abc.net.au/news/indexes/perth/rss.xml'),
                      ('Australia', 'http://www.abc.net.au/news/indexes/idx-australia/rss.xml'),
                      ('World', 'http://www.abc.net.au/news/indexes/world/rss.xml'),
                      ('Business', 'http://www.abc.net.au/news/indexes/business/rss.xml'),
                      ('Science and Technology', 'http://www.abc.net.au/news/tag/science-and-technology/rss.xml'),
                    ]
RedDogInCan is offline   Reply With Quote