Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes


Thread Tools Search this Thread
Old 09-29-2011, 08:51 AM   #1
Junior Member
zephram began at the beginning.
Posts: 1
Karma: 10
Join Date: Sep 2011
Device: Kindle
Fixed Sydney Morning Herald Recipe

The builtin Sydney Morning Herald Recipe had a minor annoying bug - It would insert the text of the "video feedback" form in to each article that has an embedded video on the website. I added the following line to the remove_tags that came after keep_only_tags and it fixed the problem


Here's the completed recipe, which now produces much cleaner articles.
__license__   = 'GPL v3'
__copyright__ = '2010-2011, Darko Miletic <darko.miletic at>'
from calibre import strftime
from import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class Smh_au(BasicNewsRecipe):
    title                 = 'The Sydney Morning Herald - Printed edition'
    __author__            = 'Darko Miletic'
    description           = 'Breaking news from Sydney, Australia and the world. Features the latest business, sport, entertainment, travel, lifestyle, and technology news.'
    publisher             = 'Fairfax Digital'
    category              = 'news, politics, Australia, Sydney'
    oldest_article        = 2
    max_articles_per_feed = 200
    no_stylesheets        = True
    encoding              = 'utf-8'
    use_embedded_content  = False
    language              = 'en_AU'
    remove_empty_feeds    = True
    masthead_url          = ''
    publication_type      = 'newspaper'
    extra_css             = """ 
                                h1{font-family: Georgia,"Times New Roman",Times,serif } 
                                body{font-family: Arial,Helvetica,sans-serif} 
                                .cT-imageLandscape,.cT-imagePortrait{font-size: x-small} 

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language

    remove_tags = [
                     dict(name='div', attrs={'id':['googleAds','moreGoogleAds','comments']})
    remove_tags_after = [dict(name='div',attrs={'class':'articleBody'})]
    keep_only_tags    = [dict(name='div',attrs={'id':'content'})]
    remove_tags       = [ 
    remove_attributes = ['width','height','lang']

    def parse_index(self):
        articles = []
        rawc = self.index_to_soup('',True)
        soup = BeautifulSoup(rawc,fromEncoding=self.encoding)
        for itimg in soup.findAll('img',src=True):
            if itimg['src'].endswith('frontpage.jpg'):
               self.cover_url = itimg['src']

        for item in soup.findAll(attrs={'class':'cN-storyHeadlineLead cfix'}):
            description = ''
            title_prefix = ''
            feed_link = item.find('a',href=True)
            descript = item.find('p')
            if descript:
               description = self.tag_to_string(descript)
            if feed_link:
                url   = feed_link['href']
                title = title_prefix + self.tag_to_string(feed_link)
                date  = strftime(self.timefmt)
                                  'title'      :title
                                 ,'date'       :date
                                 ,'url'        :url
        return [(self.tag_to_string(soup.find('title')), articles)]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll('bod'):
   = 'div'
        for item in soup.findAll('img'):
            if not item.has_key('alt'):
               item['alt'] = 'image'
        return soup
zephram is offline   Reply With Quote

fix, recipe, smh

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
New Zealand Herald recipe not working bmacaskill Recipes 2 10-31-2011 09:18 PM
South China Morning Post (SCMP) - Hong Kong - Fixed llam Recipes 0 07-02-2011 10:48 PM
E-books lack the magic of the real thing - National Times - Sydney Morning Herald AprilHare News 1 01-08-2010 01:52 AM
It's the year of the e-reader ... - The Sydney Morning Herald AprilHare News 0 01-07-2010 10:18 PM
Recipe for Sydney Daily Telegraph AprilHare Calibre 11 10-06-2008 04:31 PM

All times are GMT -4. The time now is 06:50 PM. is a privately owned, operated and funded community.