OK, so I figured out how to do it. It's not a perfect recipe, but at least it shows all the content again. I'd like to have it refined though.
- All images are loaded. If you want images, uncomment the corresponding line, but it will make the epub a lot larger. Is there an option to exclude images? For example, the header image with the logo of demorgen.be is on every page. I would like to get that excluded.
- I'd like to have the author in it as well, but somehow, I can't make that work
- Estimated reading time is also in the article, also don't know how to get it in.
- I want to exclude: loading social media posts
Code:
#!/usr/bin/env python2
__license__ = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
demorgen.be
'''
from calibre.web.feeds.news import BasicNewsRecipe
class DeMorganBe(BasicNewsRecipe):
title = u'De Morgen'
__author__ = u'Darko Miletic'
description = u'News from Belgium in Dutch'
oldest_article = 3
language = 'nl_BE'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
keep_only_tags = [
dict(name='div', attrs={'class': 'reader-title'}),
dict(name='h1'),
dict(name='div', attrs={'class': 'credits'}),
dict(name='div', attrs={'class': 'meta-data'}),
# dict(name='div', attrs={'class': 'moz-reader-block-img'}), dict(name='img'),
dict(name='div', attrs={'class': 'header-intro'}),
dict(name='p'),
]
feeds = [
(u'Nieuws', u'http://www.demorgen.be/nieuws/rss.xml'),
(u'In het nieuws', u'https://www.demorgen.be/in-het-nieuws/rss.xml'),
(u'Niet te missen', u'https://www.demorgen.be/niet-te-missen/rss.xml'),
(u'Beter leven', u'http://www.demorgen.be/beter-leven/rss.xml'),
(u'Crisis Midden-Oosten', u'http://www.demorgen.be/aanval-op-israel/rss.xml'),
(u'Koken met de Morgen', u'http://www.demorgen.be/koken-met-de-morgen/rss.xml'),
(u'Meningen', u'http://www.demorgen.be/meningen/rss.xml'),
(u'Politiek', u'http://www.demorgen.be/politiek/rss.xml'),
(u'TV & Cultuur', u'http://www.demorgen.be/tv-cultuur/rss.xml'),
(u'Oorlog in Oekraine', u'http://www.demorgen.be/oorlog-in-oekraine/rss.xml'),
(u'Tech & Wetenschap', u'http://www.demorgen.be/tech-wetenschap/rss.xml'),
(u'Sport', u'http://www.demorgen.be/sport/rss.xml'),
(u'Podcasts', u'http://www.demorgen.be/podcasts/rss.xml'),
(u'Puzzels', u'http://www.demorgen.be/puzzels/rss.xml'),
(u'Cartoons', u'http://www.demorgen.be/puzzels-cartoons/rss.xml'),
(u'Achter de schermen', u'http://www.demorgen.be/achter-de-schermen/rss.xml'),
(u'Best gelezen', u'http://www.demorgen.be/popular/rss.xml')
]