Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-15-2020, 05:55 AM   #1
PatStapleton
Member
PatStapleton began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Nov 2011
Location: Australia
Device: Kindle 4
Recipe for Independent Australia news website

Hi everybody,

I had so much fun fixing the ABC News Australia recipe I decided to crack on and make one for Independent Australia as there doesn't seem to be one currently.

Enjoy!

Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
__license__ = 'GPL v3'
__copyright__ = '2020, Pat Stapleton <pat.stapleton at gmail.com>'
'''
Recipe for Independent Australia
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.web.feeds import Feed

class IndependentAustralia(BasicNewsRecipe):
    title          = 'Independent Australia'
    language       = 'en_AU'
    __author__     = 'Pat Stapleton'
    description = 'Independent Australia is a progressive journal focusing on politics, democracy, the environment, Australian history and Australian identity. It contains news and opinion from Australia and around the world.'
    oldest_article = 7 #days
    max_articles_per_feed = 100

    feeds          = [
        ('Independent Australia', 'https://feeds.feedburner.com/IndependentAustralia'),
    ]
    
    masthead_url = 'https://independentaustralia.net/t/2018/logo-2018-lg-h90.png'
    cover_url = 'https://independentaustralia.net/t/apple-touch-icon.png'
#    cover_margins = (0,20,'#000000')
    scale_news_images_to_device = True
    oldest_article = 7 #days
    max_articles_per_feed = 100
    publication_type = 'newspaper'

#    auto_cleanup   = True # enable this as a backup option if recipe stops working

#    use_embedded_content = False # if set to true will assume that all the article content is within the feed (i.e. won't try to fetch more data)

    no_stylesheets = True
    remove_javascript = True
    
    keep_only_tags = [dict(name='div', attrs={'class':"art-display"})] #the article content is contained in

    # ************************************
    # Clear out all the unwanted html tags:
    # ************************************
    remove_tags = [
        {
            'name': ['meta', 'link', 'noscript', 'script', 'footer']
        },
        {
            'attrs': {
                'class': ['tagFooter', 'noshow', 'panelSubscription', 'mt-2'
                ]
            }
        }
    ]
    
    # ************************************
    # Tidy up the output to look neat for reading
    # ************************************
    remove_attributes = ['width', 'height', 'style']
    extra_css = '.byline{font-size:smaller;margin-bottom:10px;}.inline-caption{display:block;font-size:smaller;text-decoration: none;}'
    compress_news_images = True
    
    feeds          = [
        ('Independent Australia', 'https://feeds.feedburner.com/IndependentAustralia'),
    ]
    
    # ************************************
    # Break up feed into categories (based on BrianG's code snippet):
    # ************************************
    def parse_feeds(self): 
        # Do the "official" parse_feeds first
        feeds = BasicNewsRecipe.parse_feeds(self) 

        politicsArticles = []
        environmentArticles = []
        businessArticles = []
        lifeArticles = []
        australiaArticles = []
        # Loop thru the articles in all feeds to find articles with base categories in it        
        for curfeed in feeds:
            delList = []
            for a,curarticle in enumerate(curfeed.articles):
                if curarticle.url.lower().find('independentaustralia.net/politics/') >= 0:
                    politicsArticles.append(curarticle)
                    delList.append(curarticle)
                elif curarticle.url.lower().find('independentaustralia.net/environment/') >= 0:
                    environmentArticles.append(curarticle)
                    delList.append(curarticle)
                elif curarticle.url.lower().find('independentaustralia.net/business/') >= 0:
                    businessArticles.append(curarticle)
                    delList.append(curarticle)
                elif curarticle.url.lower().find('independentaustralia.net/life/') >= 0:
                    lifeArticles.append(curarticle)
                    delList.append(curarticle)
                elif curarticle.url.lower().find('independentaustralia.net/australia/') >= 0:
                    australiaArticles.append(curarticle)
                    delList.append(curarticle)
            if len(delList)>0:
                for d in delList:
                    index = curfeed.articles.index(d)
                    curfeed.articles[index:index+1] = []

        # If there are any of each base category found, create, append a new Feed object
        if len(politicsArticles) > 0:
            pfeed = Feed()
            pfeed.title = 'Politics'
            pfeed.image_url  = None
            pfeed.oldest_article = 30
            pfeed.id_counter = len(politicsArticles)
            # Create a new Feed, add the articles, and append to "official" list of feeds
            pfeed.articles = politicsArticles[:]
            feeds.append(pfeed)
        if len(environmentArticles) > 0:
            pfeed = Feed()
            pfeed.title = 'Environment'
            pfeed.image_url  = None
            pfeed.oldest_article = 30
            pfeed.id_counter = len(environmentArticles)
            # Create a new Feed, add the articles, and append to "official" list of feeds
            pfeed.articles = environmentArticles[:]
            feeds.append(pfeed)
        if len(businessArticles) > 0:
            pfeed = Feed()
            pfeed.title = 'Business'
            pfeed.image_url  = None
            pfeed.oldest_article = 30
            pfeed.id_counter = len(businessArticles)
            # Create a new Feed, add the articles, and append to "official" list of feeds
            pfeed.articles = businessArticles[:]
            feeds.append(pfeed)
        if len(lifeArticles) > 0:
            pfeed = Feed()
            pfeed.title = 'Life'
            pfeed.image_url  = None
            pfeed.oldest_article = 30
            pfeed.id_counter = len(lifeArticles)
            # Create a new Feed, add the articles, and append to "official" list of feeds
            pfeed.articles = lifeArticles[:]
            feeds.append(pfeed)
        if len(australiaArticles) > 0:
            pfeed = Feed()
            pfeed.title = 'Australia'
            pfeed.image_url  = None
            pfeed.oldest_article = 30
            pfeed.id_counter = len(australiaArticles)
            # Create a new Feed, add the articles, and append to "official" list of feeds
            pfeed.articles = australiaArticles[:]
            feeds.append(pfeed)

        if len(feeds) > 1: #cleanup empty first feed item
            if len(feeds[0]) == 0: del feeds[0]
        return feeds
PatStapleton is offline   Reply With Quote
Old 05-15-2020, 07:05 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,267
Karma: 27111060
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Thanks https://github.com/kovidgoyal/calibr...407983bfc8ca3f
kovidgoyal is offline   Reply With Quote
Advert
Old 05-15-2020, 06:56 PM   #3
PatStapleton
Member
PatStapleton began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Nov 2011
Location: Australia
Device: Kindle 4
Thanks Kovidgoyal
PatStapleton is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Updated recipe for ABC News Australia PatStapleton Recipes 2 05-16-2020 04:27 AM
recipe for EOS Wetenschap (Dutch science news website) erkfuizfeuadjfjz Recipes 0 02-19-2017 09:33 AM
improved recipe for hln.be (Belgian news website) erkfuizfeuadjfjz Recipes 0 02-19-2017 09:10 AM
Recipe for nos.nl (Dutch news website) erkfuizfeuadjfjz Recipes 0 02-18-2017 11:26 AM
Recipe for ABC News (Australia) RedDogInCan Recipes 5 11-20-2011 10:16 AM


All times are GMT -4. The time now is 06:39 PM.


MobileRead.com is a privately owned, operated and funded community.