MobileRead Forums - View Single Post - Belgian-Dutch recipes Broken (for some time)

Aimylios · 07-10-2016, 04:35 AM

Hi Kunvp,

looking at my changes to the gva_be.recipe will probably not help you very much to understand how to work on other recipes. I removed some obsolete code which makes the change look bigger than it actually was.

As far as I can see, all of the Belgian Dutch news sources have a valid table of contents. This means the feed addresses are still correct, but there's something wrong with the extraction of the content. Modifying the keep_only_tags and remove_tags sections should be sufficient in this case.
For example, if you look at the demorgen_be.recipe you will find the line:

Code:

    keep_only_tags = [dict(name='div' , attrs={'class':'art_box2'})]

which means that Calibre expects the content to be wrapped into an html tag like <div class="art_box2">...</div>. But if you look at the source code of an arbitrary article (picture attached) you will see that the relevant tag is <div class="article__wrapper">...</div>. By changing the line above to:

Code:

    keep_only_tags = [dict(name='div' , attrs={'class':'article__wrapper'})]

you should get a working recipe (didn't try it myself).

For an in-depth explanation of recipe programming just have a look at the Calibre documentation:
https://manual.calibre-ebook.com/news.html

07-10-2016, 04:35 AM	#7
Aimylios Member Posts: 17 Karma: 10 Join Date: Apr 2016 Device: Tolino Vision 3HD	Hi Kunvp, looking at my changes to the gva_be.recipe will probably not help you very much to understand how to work on other recipes. I removed some obsolete code which makes the change look bigger than it actually was. As far as I can see, all of the Belgian Dutch news sources have a valid table of contents. This means the feed addresses are still correct, but there's something wrong with the extraction of the content. Modifying the keep_only_tags and remove_tags sections should be sufficient in this case. For example, if you look at the demorgen_be.recipe you will find the line: Code: keep_only_tags = [dict(name='div' , attrs={'class':'art_box2'})] which means that Calibre expects the content to be wrapped into an html tag like <div class="art_box2">...</div>. But if you look at the source code of an arbitrary article (picture attached) you will see that the relevant tag is <div class="article__wrapper">...</div>. By changing the line above to: Code: keep_only_tags = [dict(name='div' , attrs={'class':'article__wrapper'})] you should get a working recipe (didn't try it myself). For an in-depth explanation of recipe programming just have a look at the Calibre documentation: https://manual.calibre-ebook.com/news.html Attached Thumbnails