Here is a recipe for loading news articles from the ABC News RSS feeds
http://www.abc.net.au/news/feeds/rss.htm . The ABC is the independent public broadcaster here in Australia and is considered to be relatively unbiased compared to other news sources.
The recipe currently loads articles from Top Stories, the major capital cities, Australia, World, Business, and Science and Technology feeds, but the ABC also provides other feeds on a wide range of topics and geographical areas in Australia.
One problem is that they have a habit of including video only articles in the feed and I haven't thought of a way to filter them out.
Code:
__license__ = 'GPL v3'
__copyright__ = '2010, Dean Cording'
'''
abc.net.au/news
'''
import re
from calibre.web.feeds.recipes import BasicNewsRecipe
class ABCNews(BasicNewsRecipe):
title = 'ABC News'
__author__ = 'Dean Cording'
description = 'News from Australia'
masthead_url = 'http://www.abc.net.au/news/assets/v5/images/common/logo-news.png'
cover_url = 'http://www.abc.net.au/news/assets/v5/images/common/logo-news.png'
oldest_article = 2
max_articles_per_feed = 100
no_stylesheets = False
#delay = 1
use_embedded_content = False
encoding = 'utf8'
publisher = 'ABC News'
category = 'News, Australia, World'
language = 'en_AU'
publication_type = 'newsportal'
preprocess_regexps = [(re.compile(r'<!--.*?-->', re.DOTALL), lambda m: '')]
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
,'linearize_tables': False
}
keep_only_tags = dict(id='article')
remove_tags = [dict(attrs={'class':['related', 'tags']}),
dict(id='statepromo')
]
remove_attributes = ['width','height']
feeds = [
('Top Stories', 'http://www.abc.net.au/news/syndicate/topstoriesrss.xml'),
('Canberra', 'http://www.abc.net.au/news/indexes/idx-act/rss.xml'),
('Sydney', 'http://www.abc.net.au/news/indexes/sydney/rss.xml'),
('Melbourne', 'http://www.abc.net.au/news/indexes/melbourne/rss.xml'),
('Brisbane', 'http://www.abc.net.au/news/indexes/brisbane/rss.xml'),
('Perth', 'http://www.abc.net.au/news/indexes/perth/rss.xml'),
('Australia', 'http://www.abc.net.au/news/indexes/idx-australia/rss.xml'),
('World', 'http://www.abc.net.au/news/indexes/world/rss.xml'),
('Business', 'http://www.abc.net.au/news/indexes/business/rss.xml'),
('Science and Technology', 'http://www.abc.net.au/news/tag/science-and-technology/rss.xml'),
]