|
|
#1 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Dec 2024
Device: elipsa
|
recipe request for demorgen.be
I'm rather new to Calibre and ereaders. I am trying to download the news from the website: https://www.demorgen.be with calibre. But I do only get the titles and not the actual content. I looked at the source code of the built in recipe which I believe is this link.
Most of the articles are behind a paywall, but it's a "soft" paywall. If you disable JavaScript or use the reader modus of a browser, you can read all articles. Is there anyone that can help with a new recipe? I am a bit technical, but programming/Python is not one of my strengths unfortunately
|
|
|
|
|
|
#2 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Dec 2024
Device: elipsa
|
OK, so I figured out how to do it. It's not a perfect recipe, but at least it shows all the content again. I'd like to have it refined though.
Code:
#!/usr/bin/env python2
__license__ = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
demorgen.be
'''
from calibre.web.feeds.news import BasicNewsRecipe
class DeMorganBe(BasicNewsRecipe):
title = u'De Morgen'
__author__ = u'Darko Miletic'
description = u'News from Belgium in Dutch'
oldest_article = 3
language = 'nl_BE'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
keep_only_tags = [
dict(name='div', attrs={'class': 'reader-title'}),
dict(name='h1'),
dict(name='div', attrs={'class': 'credits'}),
dict(name='div', attrs={'class': 'meta-data'}),
# dict(name='div', attrs={'class': 'moz-reader-block-img'}), dict(name='img'),
dict(name='div', attrs={'class': 'header-intro'}),
dict(name='p'),
]
feeds = [
(u'Nieuws', u'http://www.demorgen.be/nieuws/rss.xml'),
(u'In het nieuws', u'https://www.demorgen.be/in-het-nieuws/rss.xml'),
(u'Niet te missen', u'https://www.demorgen.be/niet-te-missen/rss.xml'),
(u'Beter leven', u'http://www.demorgen.be/beter-leven/rss.xml'),
(u'Crisis Midden-Oosten', u'http://www.demorgen.be/aanval-op-israel/rss.xml'),
(u'Koken met de Morgen', u'http://www.demorgen.be/koken-met-de-morgen/rss.xml'),
(u'Meningen', u'http://www.demorgen.be/meningen/rss.xml'),
(u'Politiek', u'http://www.demorgen.be/politiek/rss.xml'),
(u'TV & Cultuur', u'http://www.demorgen.be/tv-cultuur/rss.xml'),
(u'Oorlog in Oekraine', u'http://www.demorgen.be/oorlog-in-oekraine/rss.xml'),
(u'Tech & Wetenschap', u'http://www.demorgen.be/tech-wetenschap/rss.xml'),
(u'Sport', u'http://www.demorgen.be/sport/rss.xml'),
(u'Podcasts', u'http://www.demorgen.be/podcasts/rss.xml'),
(u'Puzzels', u'http://www.demorgen.be/puzzels/rss.xml'),
(u'Cartoons', u'http://www.demorgen.be/puzzels-cartoons/rss.xml'),
(u'Achter de schermen', u'http://www.demorgen.be/achter-de-schermen/rss.xml'),
(u'Best gelezen', u'http://www.demorgen.be/popular/rss.xml')
]
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Dec 2024
Device: elipsa
|
Unless mistaken, I can't edit my post above anymore, I made the recipe better yesterday evening. The most annoying thing still, is empty pages between articles now. I also have pictures in it now, but still some unwanted which I don't know how to exclude without excluding all pictures. Also the cover picture isn't ideal and titles of "chapters" don't match the actual content. but yeah, ... here's the new code that still needs work. At least, the content is there again
Code:
#!/usr/bin/env python2
__license__ = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
demorgen.be
'''
from calibre.web.feeds.news import BasicNewsRecipe
class DeMorganBe(BasicNewsRecipe):
title = u'De Morgen'
__author__ = u'Darko Miletic'
description = u'News from Belgium in Dutch'
oldest_article = 1
language = 'nl_BE'
max_articles_per_feed = 100
no_stylesheets = False
use_embedded_content = False
def get_cover_url(self):
cover_url = "https://usercontent.one/wp/www.insidejazz.be/wp-content/uploads/2018/11/pic0143.png"
return cover_url
keep_only_tags = [
dict(name='div', attrs={'class': 'reader-title'}),
dict(name='h1'),
dict(name='div', attrs={'class': 'credits'}),
dict(name='div', attrs={'class': 'meta-data'}),
dict(name='div', attrs={'class': 'moz-reader-block-img'}), dict(name='img'),
dict(name='div', attrs={'class': 'header-intro'}),
dict(name='p'),
]
remove_tags = [
# dict(name='script'),
dict(name='p', attrs={'class': 'rtlowr1'}),
dict(name='p', attrs={'class': 'qmn3qt1'}),
dict(name='img', attrs={'class': '_1ubw0re1 _3ej1u36'}),
dict(name='img', attrs={'class': '_15tatjw0'}),
# dict(name='ul', attrs={'class': 'bulletSeparatedList'}),
# dict(name='a', attrs={'class': 'shareImage'}),
dict(name='h2'),
]
feeds = [
(u'Nieuws', u'http://www.demorgen.be/nieuws/rss.xml'),
(u'In het nieuws', u'https://www.demorgen.be/in-het-nieuws/rss.xml'),
(u'Niet te missen', u'https://www.demorgen.be/niet-te-missen/rss.xml'),
(u'Beter leven', u'http://www.demorgen.be/beter-leven/rss.xml'),
(u'Crisis Midden-Oosten', u'http://www.demorgen.be/aanval-op-israel/rss.xml'),
# (u'Koken met de Morgen', u'http://www.demorgen.be/koken-met-de-morgen/rss.xml'),
(u'Meningen', u'http://www.demorgen.be/meningen/rss.xml'),
(u'Politiek', u'http://www.demorgen.be/politiek/rss.xml'),
(u'TV & Cultuur', u'http://www.demorgen.be/tv-cultuur/rss.xml'),
(u'Oorlog in Oekraine', u'http://www.demorgen.be/oorlog-in-oekraine/rss.xml'),
(u'Tech & Wetenschap', u'http://www.demorgen.be/tech-wetenschap/rss.xml'),
# (u'Sport', u'http://www.demorgen.be/sport/rss.xml'),
# (u'Podcasts', u'http://www.demorgen.be/podcasts/rss.xml'),
# (u'Puzzels', u'http://www.demorgen.be/puzzels/rss.xml'),
# (u'Cartoons', u'http://www.demorgen.be/puzzels-cartoons/rss.xml'),
# (u'Achter de schermen', u'http://www.demorgen.be/achter-de-schermen/rss.xml'),
# (u'Best gelezen', u'http://www.demorgen.be/popular/rss.xml')
]
|
|
|
|
|
|
#4 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Recipe Request | NSILMike | Recipes | 0 | 02-07-2023 02:58 PM |
| recipe request | polymath | Recipes | 0 | 05-22-2013 07:09 PM |
| recipe request | chell1948 | Recipes | 1 | 06-02-2011 02:23 PM |
| recipe request | Torx | Recipes | 0 | 12-20-2010 09:33 AM |
| Request for Recipe | ddavtian | Calibre | 2 | 11-24-2008 03:43 AM |