View Single Post
Old 01-03-2025, 02:32 PM   #2
bucovaina78
Junior Member
bucovaina78 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Dec 2024
Device: elipsa
OK, so I figured out how to do it. It's not a perfect recipe, but at least it shows all the content again. I'd like to have it refined though.
  • All images are loaded. If you want images, uncomment the corresponding line, but it will make the epub a lot larger. Is there an option to exclude images? For example, the header image with the logo of demorgen.be is on every page. I would like to get that excluded.
  • I'd like to have the author in it as well, but somehow, I can't make that work
  • Estimated reading time is also in the article, also don't know how to get it in.
  • I want to exclude: loading social media posts

Code:
#!/usr/bin/env python2

__license__ = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
demorgen.be
'''

from calibre.web.feeds.news import BasicNewsRecipe


class DeMorganBe(BasicNewsRecipe):
    title = u'De Morgen'
    __author__ = u'Darko Miletic'
    description = u'News from Belgium in Dutch'
    oldest_article = 3
    language = 'nl_BE'

    max_articles_per_feed = 100
    no_stylesheets = True
    use_embedded_content = False

    keep_only_tags = [
        dict(name='div', attrs={'class': 'reader-title'}),
        dict(name='h1'),
        dict(name='div', attrs={'class': 'credits'}),
        dict(name='div', attrs={'class': 'meta-data'}),
#        dict(name='div', attrs={'class': 'moz-reader-block-img'}), dict(name='img'),
        dict(name='div', attrs={'class': 'header-intro'}),
        dict(name='p'),
    ]

    feeds = [
        (u'Nieuws', u'http://www.demorgen.be/nieuws/rss.xml'),
        (u'In het nieuws', u'https://www.demorgen.be/in-het-nieuws/rss.xml'),
        (u'Niet te missen', u'https://www.demorgen.be/niet-te-missen/rss.xml'),
        (u'Beter leven', u'http://www.demorgen.be/beter-leven/rss.xml'),
        (u'Crisis Midden-Oosten', u'http://www.demorgen.be/aanval-op-israel/rss.xml'),
        (u'Koken met de Morgen', u'http://www.demorgen.be/koken-met-de-morgen/rss.xml'),
        (u'Meningen', u'http://www.demorgen.be/meningen/rss.xml'),
        (u'Politiek', u'http://www.demorgen.be/politiek/rss.xml'),
        (u'TV & Cultuur', u'http://www.demorgen.be/tv-cultuur/rss.xml'),
        (u'Oorlog in Oekraine', u'http://www.demorgen.be/oorlog-in-oekraine/rss.xml'),
        (u'Tech & Wetenschap', u'http://www.demorgen.be/tech-wetenschap/rss.xml'),
        (u'Sport', u'http://www.demorgen.be/sport/rss.xml'),
        (u'Podcasts', u'http://www.demorgen.be/podcasts/rss.xml'),
        (u'Puzzels', u'http://www.demorgen.be/puzzels/rss.xml'),
        (u'Cartoons', u'http://www.demorgen.be/puzzels-cartoons/rss.xml'),
        (u'Achter de schermen', u'http://www.demorgen.be/achter-de-schermen/rss.xml'),
        (u'Best gelezen', u'http://www.demorgen.be/popular/rss.xml')
    ]
bucovaina78 is offline   Reply With Quote