View Single Post
Old 11-29-2010, 06:12 PM   #1
RedDogInCan
Member
RedDogInCan began at the beginning.
 
RedDogInCan's Avatar
 
Posts: 13
Karma: 10
Join Date: Nov 2010
Location: Australia
Device: Kindle DX
Recipe for Business Spectator (Australia)

Here is a recipe to collect articles from The Business Spectator, an Australian business news and commentary web site.

Code:
__license__   = 'GPL v3'
__copyright__ = '2010, Dean Cording'
'''
abc.net.au/news
'''
import re
from calibre.web.feeds.recipes import BasicNewsRecipe

class BusinessSpectator(BasicNewsRecipe):
    title                  = 'Business Spectator'
    __author__             = 'Dean Cording'
    description            = 'Australian Business News & commentary delivered the way you want it.'
    masthead_url           = 'http://www.businessspectator.com.au/bs.nsf/logo-business-spectator.gif'
    cover_url              = masthead_url

    oldest_article         = 2
    max_articles_per_feed  = 100
    no_stylesheets         = True
    #delay                  = 1
    use_embedded_content   = False
    encoding               = 'utf8'
    publisher              = 'Business Spectator'
    category               = 'News, Australia, Business'
    language               = 'en_AU'
    publication_type       = 'newsportal'
    preprocess_regexps     = [(re.compile(r'<!--.*?-->', re.DOTALL), lambda m: '')]
    conversion_options = {
                             'comments'        : description
                            ,'tags'            : category
                            ,'language'        : language
                            ,'publisher'       : publisher
                            ,'linearize_tables': False
                         }

    keep_only_tags    =  [dict(id='storyHeader'), dict(id='body-html')]

    remove_tags = [dict(attrs={'class':'hql'})]

    remove_attributes = ['width','height','style']

    feeds          = [
                      ('Top Stories', 'http://www.businessspectator.com.au/top-stories.rss'),
                      ('Alan Kohler', 'http://www.businessspectator.com.au/bs.nsf/RSS?readform&type=spectators&cat=Alan%20Kohler'),
                      ('Robert Gottliebsen', 'http://www.businessspectator.com.au/bs.nsf/RSS?readform&type=spectators&cat=Robert%20Gottliebsen'),
                      ('Stephen Bartholomeusz', 'http://www.businessspectator.com.au/bs.nsf/RSS?readform&type=spectators&cat=Stephen%20Bartholomeusz'),
                      ('Daily Dossier', 'http://www.businessspectator.com.au/bs.nsf/RSS?readform&type=kgb&cat=dossier'),
                      ('Australia', 'http://www.businessspectator.com.au/bs.nsf/RSS?readform&type=region&cat=australia'),
                    ]
RedDogInCan is offline   Reply With Quote