View Single Post
Old 07-10-2016, 04:35 AM   #7
Aimylios
Member
Aimylios began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Apr 2016
Device: Tolino Vision 3HD
Hi Kunvp,

looking at my changes to the gva_be.recipe will probably not help you very much to understand how to work on other recipes. I removed some obsolete code which makes the change look bigger than it actually was.

As far as I can see, all of the Belgian Dutch news sources have a valid table of contents. This means the feed addresses are still correct, but there's something wrong with the extraction of the content. Modifying the keep_only_tags and remove_tags sections should be sufficient in this case.
For example, if you look at the demorgen_be.recipe you will find the line:
Code:
    keep_only_tags = [dict(name='div' , attrs={'class':'art_box2'})]
which means that Calibre expects the content to be wrapped into an html tag like <div class="art_box2">...</div>. But if you look at the source code of an arbitrary article (picture attached) you will see that the relevant tag is <div class="article__wrapper">...</div>. By changing the line above to:
Code:
    keep_only_tags = [dict(name='div' , attrs={'class':'article__wrapper'})]
you should get a working recipe (didn't try it myself).

For an in-depth explanation of recipe programming just have a look at the Calibre documentation:
https://manual.calibre-ebook.com/news.html
Attached Thumbnails
Click image for larger version

Name:	example_demorgen.jpg
Views:	313
Size:	220.1 KB
ID:	150096  
Aimylios is offline   Reply With Quote