Belgian-Dutch recipes Broken (for some time)

Kunvp · 07-08-2016, 07:42 AM

None of the Belgian (Dutch) built in recipes still work.

As I am a newbie when it comes to recipes,
mentioning that they do not work my contribution. Sorry.

What I get is: a menu, a paragraph menu, but no artikels.
It happens in all Belgian news sources. I haven't checked the Dutch sources yet.

| Volgende | Paragraafmenu | Hoofdmenu |
This article was downloaded by calibre fromh
http://www.gva.be/cnt/dmf20160708_02...haven-zaventem
| Paragraafmenu | Hoofdmenu |

kovidgoyal · 07-08-2016, 08:27 AM

I dont maintain recipes for languages I cannot read, as that makes it much harder to understand the website being scraped. So you will have to hope that someone who both reads the language and knows how to code is willing tohelp.

Kunvp · 07-08-2016, 09:03 AM

Quote:

Originally Posted by kovidgoyal

I dont maintain recipes for languages I cannot read, as that makes it much harder to understand the website being scraped. So you will have to hope that someone who both reads the language and knows how to code is willing tohelp.

Thank you for the quick reply.
If nobody comes forward these days I would suggest to delete the recipes.
Is someone does come forward i can help, as the Belgian newspaper marked has changed considerbly.

Keep on the good work.

DrChiper · 07-08-2016, 01:45 PM

It would help to mention which ones did not work for you ...
I just tried some Belgian and Dutch recipes and they all work.
Yes, they are slow loading, so be patient!

Aimylios · 07-08-2016, 04:17 PM

Well, the downloaded files do not contain any articles, so I would indeed say they are broken. I just had a quick look at the GVA recipe mentioned in the first post. It was easy to fix, but I'll leave the other recipes to someone else.

Update for gva_be.recipe:

Code:

#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function

__license__   = 'GPL v3'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'

'''
www.gva.be
'''

import re
from calibre.web.feeds.news import BasicNewsRecipe

class GazetvanAntwerpen(BasicNewsRecipe):
    title                 = 'Gazet van Antwerpen'
    __author__            = 'Darko Miletic'
    description           = 'News from Belgium in Dutch'
    publisher             = 'Gazet van Antwerpen'
    category              = 'news, politics, Belgium'
    language              = 'nl_BE'

    oldest_article        = 2
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True

    masthead_url = 'http://2.gvacdn.be/extra/assets/img/gazet-van-antwerpen-red.svg'

    feeds = [
        ('Stad & Regio', 'http://www.gva.be/syndicationservices/artfeedservice.svc/rss/mostrecent/stadenregio'),
        ('Economie', 'http://www.gva.be/syndicationservices/artfeedservice.svc/rss/mostrecent/economie'),
        ('Binnenland', 'http://www.gva.be/syndicationservices/artfeedservice.svc/rss/mostrecent/binnenland'),
        ('Buitenland', 'http://www.gva.be/syndicationservices/artfeedservice.svc/rss/mostrecent/buitenland'),
        ('Media & Cultuur', 'http://www.gva.be/syndicationservices/artfeedservice.svc/rss/mostrecent/mediaencultuur'),
        ('Sport', 'http://www.gva.be/syndicationservices/artfeedservice.svc/rss/mostrecent/sport')
    ]

    keep_only_tags = [
        dict(name='header', attrs={'class':'article__header'}),
        dict(name='footer', attrs={'class':'article__meta'}),
        dict(name='div', attrs={'class':['article', 'article__body', 'slideshow__intro']}),
        dict(name='figure', attrs={'class':'article__image'})
    ]

    remove_tags = [
        dict(name=['embed', 'object']),
        dict(name='div', attrs={'class':['note NotePortrait', 'note']}),
        dict(name='ul', attrs={'class':re.compile('article__share')}),
        dict(name='div', attrs={'class':'slideshow__controls'}),
        dict(name='a', attrs={'role':'button'}),
        dict(name='figure', attrs={'class':re.compile('video')})
    ]

    remove_attributes = ['width', 'height']

    def preprocess_html(self, soup):
        del soup.body['onload']
        for item in soup.findAll(style=True):
            del item['style']
        return soup

Kunvp · 07-09-2016, 06:37 PM

Quote:

Originally Posted by Aimylios

It was easy to fix,
[/CODE]

Thank you Amylios.

I'll try to understand what you have doen, but you may give me a hint too.
If I manage to understand, I have a look at the others.

Aimylios · 07-10-2016, 04:35 AM

Hi Kunvp,

looking at my changes to the gva_be.recipe will probably not help you very much to understand how to work on other recipes. I removed some obsolete code which makes the change look bigger than it actually was.

As far as I can see, all of the Belgian Dutch news sources have a valid table of contents. This means the feed addresses are still correct, but there's something wrong with the extraction of the content. Modifying the keep_only_tags and remove_tags sections should be sufficient in this case.
For example, if you look at the demorgen_be.recipe you will find the line:

Code:

    keep_only_tags = [dict(name='div' , attrs={'class':'art_box2'})]

which means that Calibre expects the content to be wrapped into an html tag like <div class="art_box2">...</div>. But if you look at the source code of an arbitrary article (picture attached) you will see that the relevant tag is <div class="article__wrapper">...</div>. By changing the line above to:

Code:

    keep_only_tags = [dict(name='div' , attrs={'class':'article__wrapper'})]

you should get a working recipe (didn't try it myself).

For an in-depth explanation of recipe programming just have a look at the Calibre documentation:
https://manual.calibre-ebook.com/news.html

Kunvp · 07-11-2016, 05:28 AM

Thank you Aimylios, your explanation is very useful in starting to understanding the issue.
Back at school, years, decades actually, ago I had to write scripts to convert layout code from pc to hi-end systems. This actually looks kind of similar.

I'll give it a try, but don't expect it by tomorrow.
:-)

oCkz7bJ_ · 07-28-2016, 05:11 AM

Quote:

Originally Posted by Aimylios

Update for gva_be.recipe

May I suggest to correct

Code:

    publisher             = 'Gazet van Antwerpen'

to

Code:

    publisher             = 'Mediahuis'

?

Using the example above I've come up with a recipe for another newspaper from the same publisher; Het Nieuwsblad:

Code:

#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1467571059(BasicNewsRecipe):
    title                 = 'Het Nieuwsblad'
    __author__            = 'Darko Miletic, Aimylios, oCkz7bJ_'
    description           = 'Het Nieuwsblad is goed voor u.'
    publisher             = 'Mediahuis'
    category              = 'news, politics, Belgium'
    language              = 'nl_BE'

    oldest_article        = 2
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    
    cover_url    = 'http://www.lottocyclingcup.be/lc15/dendermedia/images/details/foto/partners_36/nieuwsblad_1_20160210_1242138359.jpg'
    masthead_url = 'http://www.mediahuisconnect.be/uploads/media/5576fa0b83c38/nieuwsblad.svg'

    #Source: http://www.nieuwsblad.be/rss
    feeds          = [
	# Nieuws
        ('Snelnieuws', 'http://feeds.nieuwsblad.be/nieuws/snelnieuws'),
        ('Binnenland', 'http://feeds.nieuwsblad.be/nieuws/binnenland'),
        ('Buitenland', 'http://feeds.nieuwsblad.be/nieuwsblad/buitenland'),
	# Economie
        ('Economie', 'http://feeds.nieuwsblad.be/economie/home'),
        ('Consument', 'http://feeds.nieuwsblad.be/economie/algemeen'),
        ('Bedrijven', 'http://feeds.nieuwsblad.be/economie/bedrijven'),
        ('Werk', 'http://feeds.nieuwsblad.be/economie/Werk'),
        ('Beurs', 'http://feeds.nieuwsblad.be/economie/beurs'),
        # Regio
        #('0123 Region1', 'http://www.nieuwsblad.be/rss.aspx?intro=1&section=postcode&postcode=0123'),
        #('3456 Region2', 'http://www.nieuwsblad.be/rss.aspx?intro=1&section=postcode&postcode=3456'),
        #('6789 Region3', 'http://www.nieuwsblad.be/rss.aspx?intro=1&section=postcode&postcode=6789'),
	# Sport
        ('Voetbal', 'http://feeds.nieuwsblad.be/nieuwsblad/sport/voetbal'),
        ('Wielrennen', 'http://feeds.nieuwsblad.be/nieuwsblad/sport/wielrennen'),
        ('Tennis', 'http://feeds.nieuwsblad.be/nieuwsblad/sport/tennis'),
        ('Autosport', 'http://feeds.nieuwsblad.be/nieuwsblad/sport/autosport'),
        ('Basketbal', 'http://feeds.nieuwsblad.be/nieuwsblad/sport/basketbal'),
        ('Volleybal', 'http://feeds.nieuwsblad.be/nieuwsblad/sport/volleybal'),
        ('Atletiek', 'http://feeds.nieuwsblad.be/nieuwsblad/sport/atletiek'),
	# Extra
        ('Film', 'http://feeds.nieuwsblad.be/life/film'),
        ('Boek', 'http://feeds.nieuwsblad.be/life/boeken'),
        ('Muziek', 'http://feeds.nieuwsblad.be/life/muziek'),
        ('Podium', 'http://feeds.nieuwsblad.be/life/podium'),
        ('TV & Radio', 'http://feeds.nieuwsblad.be/life/tv'),
	# She
        ('BV & Co', 'http://feeds.nieuwsblad.be/life/bv'),
        ('Mode & Design', 'http://feeds.nieuwsblad.be/life/mode'),
        ('Culinair', 'http://feeds.nieuwsblad.be/life/culinair'),
        ('Gezondheid', 'http://feeds.nieuwsblad.be/life/gezondheid'),
        ('Reizen', 'http://feeds.nieuwsblad.be/life/reizen'),
        ('Dieren', 'http://feeds.nieuwsblad.be/life/dieren'),
	# Weblog
        ('Surfplank', 'http://nieuwsblad.typepad.com/surfplank/atom.xml'),
        ('Boeken', 'http://nieuwsblad.typepad.com/boeken/atom.xml'),
        ('Strips', 'http://nieuwsblad.typepad.com/strips/atom.xml'),
        ('DVD', 'http://nieuwsblad.typepad.com/dvd/atom.xml'),
        ('Dierendoktor', 'http://nieuwsblad.typepad.com/dierendokter/atom.xml'),
        ('Zapdog', 'http://nieuwsblad.typepad.com/zapdog/atom.xml'),
	]
    
    keep_only_tags = [
        dict(name='header', attrs={'class':'article__header'}),
        dict(name='footer', attrs={'class':'article__meta'}),
        dict(name='div', attrs={'class':['article', 'article__body', 'slideshow__intro']}),
        dict(name='figure', attrs={'class':'article__image'})
    ]

    remove_tags = [
        dict(name=['embed', 'object']),
        dict(name='div', attrs={'class':['note NotePortrait', 'note']}),
        dict(name='ul', attrs={'class':re.compile('article__share')}),
        dict(name='div', attrs={'class':'slideshow__controls'}),
        dict(name='a', attrs={'role':'button'}),
        dict(name='figure', attrs={'class':re.compile('video')})
    ]

    remove_attributes = ['width', 'height']

    def preprocess_html(self, soup):
        del soup.body['onload']
        for item in soup.findAll(style=True):
            del item['style']
        return soup

Note: under "# regio"; only one out of the three postal codes I'm interested in seems to generate some content while the rss feeds do exist. Still need to figure out a solution for that. (The ones in the recipe above are fake placeholders.)

oCkz7bJ_ · 07-28-2016, 07:36 AM

Here's a fairly simple one for DataNews:

Code:

#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1468055030(BasicNewsRecipe):
    title          		  = 'DataNews'
    __author__            = 'oCkz7bJ_'
    description           = 'Technology / Best Practice / Business'
    publisher             = 'Roularta Media Group'   
    category              = 'news, information technology, Belgium'
    language              = 'nl_BE'

    oldest_article        = 2
    max_articles_per_feed = 100
    auto_cleanup   		  = True
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    
    cover_url    = 'http://datablend.be/wp-content/uploads/2014/01/Data_News_logo-short.jpg'
    masthead_url = 'http://datanews.knack.be/images/svg/logos/logo_Site-DataNews-NL.svg'

    # Source: http://datanews.knack.be/rss/
    feeds          = [
        ('Technology', 'http://datanews.knack.be/ict/feed.rss'),
        ('Opinie', 'http://datanews.knack.be/ict/opinie/feed.rss'),
        ('Gadgets', 'http://datanews.knack.be/ict/gadgets/feed.rss'),
        ('Foto', 'http://datanews.knack.be/ict/foto/feed.rss'),
        ('Nieuws', 'http://datanews.knack.be/ict/nieuws/feed.rss'),
        ('Reviews', 'http://datanews.knack.be/ict/reviews/feed.rss'),
        ('Startups', 'http://datanews.knack.be/ict/start-ups/feed.rss'),
    ]

Kunvp · 07-29-2016, 04:37 AM

Quote:

Originally Posted by oCkz7bJ_

May I suggest to correct

Code:

    publisher             = 'Gazet van Antwerpen'

to

Code:

    publisher             = 'Mediahuis'

?

Dear oCkz7bJ_,

Thank you so much for having a look at this. I have too little knowledge to make clean scripts.

To answer your question straightforward: I wouldn't go for the title 'Mediahuis' as users in Belgium don't recognise this as a news source.
Unless, you don't want to change the title of the sourde. Because it is correct that the publisher is Mediahuis".
Mediahuis (°2013) is a joint venture of two publishers (newspaper and onlinenews site). (wikiperdia)

They run the following sites:
1. Het Nieuwsblad
De Gentenaar"
2. Gazet Van Antwerpen
3. Het belang van Limburg

The different titles have different regional content. All this titles have a strong focus on regional contend (in the paper edition)
This is reflected on there websites.

De Standaard is the so called quality newspaper/branch of the group.
They use a lot of content from the above mentioned Het Nieuwsblad, but they have more Editorials, opinon etc.

All Belgian news sources have decreased the lenght and number of FREE articles.

To summarise:
- Mediahuis will not be recognised by users. It is not a problem when the this does niot influence the title visible to users.
- It can be expected that the scripts of different titles will be fairly identical.
- The regional content should be different

Does all this answer your question?
I have the feeling I made it all too complex. :-p

Many greetings,
Koen

oCkz7bJ_ · 07-29-2016, 05:51 AM

Quote:

Originally Posted by Kunvp

Unless, you don't want to change the title of the sourde. Because it is correct that the publisher is Mediahuis

That is exactly what I propose:

Code:

    title                 = 'Gazet van Antwerpen'
    publisher             = 'Gazet van Antwerpen'

should be

Code:

    title                 = 'Gazet van Antwerpen'
    publisher             = 'Mediahuis'

Quote:

Originally Posted by Kunvp

Mediahuis (°2013) is a joint venture of two publishers (newspaper and onlinenews site). (wikipedia)

I'm from the same country as you are ;-)

Quote:

Originally Posted by Kunvp

De Standaard is the so called quality newspaper/branch of the group.
They use a lot of content from the above mentioned Het Nieuwsblad, but they have more Editorials, opinon etc.

All Belgian news sources have decreased the lenght and number of FREE articles.

I'll work on a recipe for "De Standaard", the backend for all of these publications is the same so it's fairly easy. Give me a couple of days, I prefer to test it my self first.

I suspect even a digital subscription will not provide a full content RSS feed, it's either app (iOS & Android only) or website. I'll try asking around "via via". I'd consider a subscription if they would publish their newspaper in proper ebook format. (None of the Belgian publishers do AFAIK).

oCkz7bJ_ · 08-04-2016, 02:44 PM

Here's a recipe that seems to work for De Standaard:

Code:

#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1467571059(BasicNewsRecipe):
    title                 = 'De Standaard'
    __author__            = 'Darko Miletic, Aimylios, oCkz7bJ_'
    description           = 'De Standaard'
    publisher             = 'Mediahuis'
    category              = 'news, politics, Belgium'
    language              = 'nl_BE'

    oldest_article        = 2
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    
    cover_url    = 'http://www.standaard.be/extra/assets/extra/dslive/headers/ds-black.svg'
    masthead_url = 'http://tonysweb.be/m/img/tijdschriften/de_standaard.svg'

    #Source: http://www.standaard.be/rssfeeds
    feeds          = [
        # Nieuws
        ('Binnenland', 'http://www.standaard.be/rss/section/1f2838d4-99ea-49f0-9102-138784c7ea7c'),
        ('Buitenland', 'http://www.standaard.be/rss/section/e70ccf13-a2f0-42b0-8bd3-e32d424a0aa0'),
        ('Cultuur', 'http://www.standaard.be/rss/section/ab8d3fd8-bf2f-487a-818b-9ea546e9a859'),
        ('Media', 'http://www.standaard.be/rss/section/eb1a6433-ca3f-4a3b-ab48-a81a5fb8f6e2'),
        ('Economie', 'http://www.standaard.be/rss/section/451c8e1e-f9e4-450e-aa1f-341eab6742cc'),
        ('Sport', 'http://www.standaard.be/rss/section/8f693cea-dba8-46e4-8575-807d1dc2bcb7'),
        ('Beroemd en Bizar', 'http://www.standaard.be/rss/section/113a9a78-f65a-47a8-bd1c-b24483321d0f'),
        # Standaard.biz
        ('Overzicht', 'http://www.standaard.be/rss/section/a30afc42-3737-4301-8f8a-5b6833855457'),
        ('Economie', 'http://www.standaard.be/rss/section/212b8b54-bd91-4c8b-942c-8029e8797d36'),
        ('Bedrijven', 'http://www.standaard.be/rss/section/6aa8d4fa-4b9a-40d5-aa8f-87ac72472f27'),
        ('Consument', 'http://www.standaard.be/rss/section/46025691-2ec4-4a06-b6d7-9773686a24a7'),
        ('Beurs', 'http://www.standaard.be/rss/section/74cef9d1-3b28-4b90-943a-ce685bf6ed6e'),
        ('Marketing & Media', 'http://www.standaard.be/rss/section/9bdf4a14-f8bf-4439-aaf1-344181f73e73'),
        ('Mobilia', 'http://www.standaard.be/rss/section/270b7f8f-dd73-44cb-b622-9f7200a439a7'),
        # Lifestyle
        ('Mode', 'http://www.standaard.be/rss/section/3a4b39a1-e58f-42e4-8ae9-a0f90f97f27f'),
        ('Beauty', 'http://www.standaard.be/rss/section/51dd6a40-e297-409c-af25-9f0301159a1c'),
        ('Culinair', 'http://www.standaard.be/rss/section/ec1dbffa-a00b-48e6-96f0-00d215f90744'),
        ('Reizen', 'http://www.standaard.be/rss/section/eed96e23-ed90-4818-83ab-adabf8caf0f4'),
        ('Design & Wonen', 'http://www.standaard.be/rss/section/f4dd4e8d-6cb1-4eef-abc2-06b0e3d72de4'),
        ('Gezondheid & Psycho', 'http://www.standaard.be/rss/section/a166bb48-b6b4-4c1a-beb3-9f0301160b75'),
        ('Glamour', 'http://www.standaard.be/rss/section/06b5429e-beb1-4e76-909c-9f0301162a9c'),
        ('Lifestyleblog', 'http://www.standaard.be/rss/section/246d27cb-ce7b-4245-bad4-a09f0119b450'),
        # Weblogs
        ('Autoblog', 'http://www.standaard.be/rss/tag/autoblog'),
        ('Beursexperts', 'http://www.standaard.be/rss/tag/beursexperts'),
        ('En nu even elders', 'http://www.standaard.be/rss/tag/blog-en-nu-even-elders'),
        ('Marketingblog', 'http://www.standaard.be/rss/tag/marketingblog'),
        ('TV-blog', 'http://www.standaard.be/rss/tag/tv-blog'),
        # Interactie
        ('Opinies', 'http://feeds.feedburner.com/dso-meningen-opinie')
	]
    
    keep_only_tags = [
        dict(name='header', attrs={'class':'article__header'}),
        dict(name='footer', attrs={'class':'article__meta'}),
        #dict(name='div', attrs={'class':['article', 'article__body', 'slideshow__intro']}),
        dict(name='article', attrs={'class':'article-full'}),
        dict(name='figure', attrs={'class':'article__image'})
    ]

    remove_tags = [
        dict(name=['embed', 'object']),
        dict(name='div', attrs={'class':['note NotePortrait', 'note']}),
        dict(name='ul', attrs={'class':re.compile('article__share')}),
        dict(name='div', attrs={'class':'slideshow__controls'}),
        dict(name='a', attrs={'role':'button'}),
        dict(name='figure', attrs={'class':re.compile('video')})
    ]

    remove_attributes = ['width', 'height']

    def preprocess_html(self, soup):
        del soup.body['onload']
        for item in soup.findAll(style=True):
            del item['style']
        return soup

De standaard seems to have a slightly different structure for it's webpages. I hade to make a little modification to the keep_only_tags:

Code:

        #dict(name='div', attrs={'class':['article', 'article__body', 'slideshow__intro']}),
        dict(name='article', attrs={'class':'article-full'}),

Kunvp · 08-05-2016, 08:47 AM

Quote:

Originally Posted by oCkz7bJ_

Here's a recipe that seems to work for De Standaard:
[CODE]

Thank you. I'll give it a try.
Koen

dldrmsmn · 07-21-2017, 01:31 PM

I just bumped into this thread through google search. Thanks so much for making this code and making ereaders a tad more worthwhile. I have to say I haven't tried it yet, but I have full digital access to De Sandaard and was wondering if there's any way I can download that day's newspaper to read it on my ereader? Probably not without a serious overhaul?

07-08-2016, 07:42 AM	#1
Kunvp Enthusiast Posts: 43 Karma: 10 Join Date: Oct 2012 Location: Belgium Device: Promedia e-reader (Onyx C67ML) - Aldi2016 / Former: Sony PRS-T1&T2	Belgian-Dutch recipes Broken (for some time) None of the Belgian (Dutch) built in recipes still work. As I am a newbie when it comes to recipes, mentioning that they do not work my contribution. Sorry. What I get is: a menu, a paragraph menu, but no artikels. It happens in all Belgian news sources. I haven't checked the Dutch sources yet. \| Volgende \| Paragraafmenu \| Hoofdmenu \| This article was downloaded by calibre fromh http://www.gva.be/cnt/dmf20160708_02...haven-zaventem \| Paragraafmenu \| Hoofdmenu \|

07-08-2016, 01:45 PM	#4
DrChiper Bookish Posts: 907 Karma: 1803094 Join Date: Jun 2011 Device: PC, t1, t2, t3, aura 2 v1, clara HD, Libra 2, Nxtpaper 11	It would help to mention which ones did not work for you ... I just tried some Belgian and Dutch recipes and they all work. Yes, they are slow loading, so be patient! Attached Thumbnails

07-10-2016, 04:35 AM	#7
Aimylios Member Posts: 16 Karma: 10 Join Date: Apr 2016 Device: Tolino Vision 3HD	Hi Kunvp, looking at my changes to the gva_be.recipe will probably not help you very much to understand how to work on other recipes. I removed some obsolete code which makes the change look bigger than it actually was. As far as I can see, all of the Belgian Dutch news sources have a valid table of contents. This means the feed addresses are still correct, but there's something wrong with the extraction of the content. Modifying the keep_only_tags and remove_tags sections should be sufficient in this case. For example, if you look at the demorgen_be.recipe you will find the line: Code: keep_only_tags = [dict(name='div' , attrs={'class':'art_box2'})] which means that Calibre expects the content to be wrapped into an html tag like <div class="art_box2">...</div>. But if you look at the source code of an arbitrary article (picture attached) you will see that the relevant tag is <div class="article__wrapper">...</div>. By changing the line above to: Code: keep_only_tags = [dict(name='div' , attrs={'class':'article__wrapper'})] you should get a working recipe (didn't try it myself). For an in-depth explanation of recipe programming just have a look at the Calibre documentation: https://manual.calibre-ebook.com/news.html Attached Thumbnails

07-11-2016, 05:28 AM	#8
Kunvp Enthusiast Posts: 43 Karma: 10 Join Date: Oct 2012 Location: Belgium Device: Promedia e-reader (Onyx C67ML) - Aldi2016 / Former: Sony PRS-T1&T2	I'll give it a try, but don't expect it by tomorrow. Thank you Aimylios, your explanation is very useful in starting to understanding the issue. Back at school, years, decades actually, ago I had to write scripts to convert layout code from pc to hi-end systems. This actually looks kind of similar. I'll give it a try, but don't expect it by tomorrow. :-)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
All recipes broken for me...?	NSILMike	Recipes	11	06-24-2016 08:45 PM
Gamasutra recipes broken	tom_a_sparks	Recipes	8	11-11-2015 12:25 PM
E-reader with Dutch/English or Dutch/Polish dictionary	tttx	Which one should I buy?	17	08-20-2015 05:42 AM
Dutch: de Volkskrant (subscription) is broken	cnsmr	Recipes	9	07-03-2012 06:31 PM
Times Of India, DNA recipes broken?	mihirp	Recipes	1	09-23-2011 03:09 PM

07-08-2016, 08:27 AM	#2
kovidgoyal creator of calibre Posts: 43,842 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	I dont maintain recipes for languages I cannot read, as that makes it much harder to understand the website being scraped. So you will have to hope that someone who both reads the language and knows how to code is willing tohelp.

07-21-2017, 01:31 PM	#15
dldrmsmn Junior Member Posts: 1 Karma: 10 Join Date: Jul 2017 Device: Kobo H2O	I just bumped into this thread through google search. Thanks so much for making this code and making ereaders a tad more worthwhile. I have to say I haven't tried it yet, but I have full digital access to De Sandaard and was wondering if there's any way I can download that day's newspaper to read it on my ereader? Probably not without a serious overhaul?