Thread: Radio-Canada
View Single Post
Old 06-18-2023, 07:50 PM   #1
quatorze
Junior Member
quatorze began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jun 2023
Device: Kobo Sage
Radio-Canada

Fiddled with my first recipe. I got it to work, so I figured I might as well share it.

Radio-Canada is the French side of the Canadian Broadcasting Corporation (CBC).

You can visit their site at https://ici.radio-canada.ca

Code:
Spoiler:
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8

__license__ = 'GPL v3'
__author__ = 'quatorze'
__copyright__ = '2023, quatorze'
__version__ = 'v1.0'
__date__ = '18 June 2023'
__description__ = 'Radio-Canada '

'''
https://ici.radio-canada.ca/rss/
'''

from calibre.web.feeds.news import BasicNewsRecipe

def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(attrs={
        'class': lambda x: x and frozenset(x.split()).intersection(q)})

class RadioCanada(BasicNewsRecipe):
    title            = u'Actualités'
    timefmt          = ' %Y-%m-%d'
    language         = 'fr'
    encoding         = 'utf-8'
    publisher        = 'ici.radiocanada.ca'
    publication_type = 'newspaper'
    category         = 'News, finance, economy, politics'
    ignore_duplicate_articles = {'title', 'url'}
    oldest_article   = 1.00
    
    no_stylesheets   = True
    extra_css        = '.blockquote, q, span.signature-name { font-style: italic } .footer, p.fQwTrv, p.xWUSX { font-size: 80%; } ol { list-style: none; } ol li { display: inline; } ol li+li:before { padding: 16px; content: ">" } ul { list-style-type: none; }'
    
    keep_only_tags = [
            classes('document-simple-header-container main-multimedia-item signature-container-top lead-container e-p picture-attachment-container blockquote framed signature-name'),
            dict(id='picture')
    ]

    remove_tags    = [
            classes('signature-link comment-text'),
            dict(name='aside')
    ]

    feeds            = [
        ('Politique',     'https://ici.radio-canada.ca/rss/4175'),
        ('International', 'https://ici.radio-canada.ca/rss/96'),
        ('Montréal',      'https://ici.radio-canada.ca/rss/4201'),
        ('Société',       'https://ici.radio-canada.ca/rss/7110'),
        ('Justice',       'https://ici.radio-canada.ca/rss/92411'),
        ('Science',       'https://ici.radio-canada.ca/rss/4165'),
        ('Santé',         'https://ici.radio-canada.ca/rss/4171'),
        ('Économie',      'https://ici.radio-canada.ca/rss/5717'),
        ('Techno',        'https://ici.radio-canada.ca/rss/4169'),
        ('Environnement', 'https://ici.radio-canada.ca/rss/92408'),
        ('Le reste',      'https://ici.radio-canada.ca/rss/4159')
    ]
    
    # The following was copied and adapted as per the following post:
    # https://www.mobileread.com/forums/showpost.php?p=1165462&postcount=6
    # Credit goes to user Starson17
    def parse_feeds (self): 
      feeds = BasicNewsRecipe.parse_feeds(self) 
      for feed in feeds:
        for article in feed.articles[:]:
          if 'VIDEO' in article.title.upper() or 'OHDIO' in article.title.upper() or '/emissions/' in article.url or '/segments/' in article.url or '/entrevue/' in article.url or '/ohdio/' in article.url:
            feed.articles.remove(article)
      return feeds


I don't have specific problems to report, but one weirdness is that Calibre renders the resulting EPUB perfectly, while my Kobo Aura ONE somehow removes the quote signs around the <q> ... </q> elements. The quote character in French is called the chevron (as in « ... » ) rather than the English " ... " sign. Calibre shows the chevrons just fine, whereas the Kobo just hides it. No biggie, I adjusted the style to at least render in italics, so quotes stand out anyway.

Hope someone else will find it useful.

Cheers,

-xiv
quatorze is offline   Reply With Quote