Avoid confirmation pages before the news | problem with Adnkronos rss feeds

heYooh · 11-15-2014, 08:44 AM

Hello everyone, I want to report a problem with the download of the rss feeds from the Italian news website Adnkronos.com. It used to work very good until the last few months. Now Calibre makes an ebook which only cointains the index and when you open the chapters you don't find the articles but only their urls.
I checked the recipe file and all the web addresses are correct because they match the ones on this page http://www.adnkronos.com/rss. The only thing that I found out is that when you open an article (for example, pick one from this page http://rss.feedsportal.com/c/32375/f/448341/index.rss) then you are not linked to the actual article, but instead you are redirected to a page which asks your confirmation click before enabling you to see the article. So I think this must be the cause of the problem, because it may prevent Calibre from downloading the articles. It's probably a measure implemented only recently by Adnkronos.

Anyway, are you aware of any method to bypass these confirmation pages?

kovidgoyal · 11-15-2014, 09:52 AM

You need to implement get_obfuscated_article() in your recipe. You can use it to bypass ad pages.

heYooh · 11-15-2014, 10:01 AM

Hi Kovid, it's such a pleasure to talk to you! It's my first time here and first of all I want to thank you for all the great work you've done on Calibre, as it is a wonderful software and, as I can see, it keeps improving continuously.

About this "get_obfuscated_article()" function, is it documented somewhere? I would be glad to fix this recipe and send it back to you so that you can distribute it in the next updates

kovidgoyal · 11-15-2014, 10:20 AM

The entire news fetch API including that function is documented in the User Manual, at http://manual.calibre-ebook.com/news_recipe.html

heYooh · 11-15-2014, 10:58 AM

I've set articles_are_obfuscated = True but I don't understand how to use the def get_obfuscated_article(self, url): part.

The existing code is:

Code:

def get_article_url(self, article):
        link = article.get('id', article.get('guid', None))
        return link

kovidgoyal · 11-15-2014, 11:49 AM

Look in some of the other builtin recipes for examples of its use, for example, forbes india

The diea is simply that you download the article, if it is an ad, you follow the link to the real article, then save the real article html in a temporary file and return the path thto thetemp file.

If it is not an add, simply return the original link.

heYooh · 11-15-2014, 12:48 PM

Thanks for the hint, I've given a look at the recipe of Forbes India. I'm sorry but I still can't understand how to adapt that code to my case. I'm not a programmer and my understanding of code is very limited

heYooh · 11-17-2014, 10:37 AM

If you or any other competent person around could give a look at this recipe and try to fix it, I would appreciate it a lot.

11-15-2014, 08:44 AM	#1
heYooh Junior Member Posts: 6 Karma: 10 Join Date: Nov 2014 Device: none	Avoid confirmation pages before the news \| problem with Adnkronos rss feeds Hello everyone, I want to report a problem with the download of the rss feeds from the Italian news website Adnkronos.com. It used to work very good until the last few months. Now Calibre makes an ebook which only cointains the index and when you open the chapters you don't find the articles but only their urls. I checked the recipe file and all the web addresses are correct because they match the ones on this page http://www.adnkronos.com/rss. The only thing that I found out is that when you open an article (for example, pick one from this page http://rss.feedsportal.com/c/32375/f/448341/index.rss) then you are not linked to the actual article, but instead you are redirected to a page which asks your confirmation click before enabling you to see the article. So I think this must be the cause of the problem, because it may prevent Calibre from downloading the articles. It's probably a measure implemented only recently by Adnkronos. Anyway, are you aware of any method to bypass these confirmation pages?

11-15-2014, 10:01 AM	#3
heYooh Junior Member Posts: 6 Karma: 10 Join Date: Nov 2014 Device: none	Hi Kovid, it's such a pleasure to talk to you! It's my first time here and first of all I want to thank you for all the great work you've done on Calibre, as it is a wonderful software and, as I can see, it keeps improving continuously. About this "get_obfuscated_article()" function, is it documented somewhere? I would be glad to fix this recipe and send it back to you so that you can distribute it in the next updates Last edited by heYooh; 11-15-2014 at 10:03 AM.

11-15-2014, 10:58 AM	#5
heYooh Junior Member Posts: 6 Karma: 10 Join Date: Nov 2014 Device: none	I've set articles_are_obfuscated = True but I don't understand how to use the def get_obfuscated_article(self, url): part. The existing code is: Code: def get_article_url(self, article): link = article.get('id', article.get('guid', None)) return link

11-15-2014, 11:49 AM	#6
kovidgoyal creator of calibre Posts: 45,251 Karma: 27110894 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Look in some of the other builtin recipes for examples of its use, for example, forbes india The diea is simply that you download the article, if it is an ad, you follow the link to the real article, then save the real article html in a temporary file and return the path thto thetemp file. If it is not an add, simply return the original link. Last edited by kovidgoyal; 11-15-2014 at 11:51 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
On Feedbooks: viewing full news article through RSS feeds	edercito	Amazon Kindle	7	07-24-2009 02:23 AM
Using the Feedbooks.com RSS/news feeds via Mobi2IMP and Impserve	nrapallo	Fictionwise eBookwise	0	03-23-2009 10:46 PM
Kindle Newbie working with RSS news feeds	Junior94	Introduce Yourself	2	01-02-2009 09:14 AM
How can I aggregate News/Rss feeds to .mobi	tinybilbo	Bookeen	5	11-08-2008 02:07 PM
Yahoo! Finance News through RSS feeds	TadW	Lounge	0	01-13-2005 09:06 AM

11-15-2014, 09:52 AM	#2
kovidgoyal creator of calibre Posts: 45,251 Karma: 27110894 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You need to implement get_obfuscated_article() in your recipe. You can use it to bypass ad pages.

11-15-2014, 10:20 AM	#4
kovidgoyal creator of calibre Posts: 45,251 Karma: 27110894 Join Date: Oct 2006 Location: Mumbai, India Device: Various	The entire news fetch API including that function is documented in the User Manual, at http://manual.calibre-ebook.com/news_recipe.html

11-15-2014, 12:48 PM	#7
heYooh Junior Member Posts: 6 Karma: 10 Join Date: Nov 2014 Device: none	Thanks for the hint, I've given a look at the recipe of Forbes India. I'm sorry but I still can't understand how to adapt that code to my case. I'm not a programmer and my understanding of code is very limited

11-17-2014, 10:37 AM	#8
heYooh Junior Member Posts: 6 Karma: 10 Join Date: Nov 2014 Device: none	If you or any other competent person around could give a look at this recipe and try to fix it, I would appreciate it a lot.

Advert

Advert