![]() |
#1 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jan 2015
Device: Kindle Voyage
|
Trying to make my first recipe
Hello everyone.
The Salon.com recipe built-in to Calibre only fetches the TV & Books sections, all the other sections it tries to fetch are missing. So, I'm trying to make my own Salon.com recipe. I'm starting as simple as possible, with just one RSS feed but when Calibre fetches it the ebook is empty. This is the feed http://www.salon.com/category/news/feed/rss/ and this is the recipe: Code:
#!/usr/bin/env python # vim:fileencoding=utf-8 from __future__ import unicode_literals, division, absolute_import, print_function from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1421868592(BasicNewsRecipe): title = 'Salon Custom' oldest_article = 7 max_articles_per_feed = 100 auto_cleanup = True feeds = [ ('News', 'http://www.salon.com/category/news/feed/rss/'), ] Thanks for your help! Last edited by PeterT; 01-21-2015 at 05:49 PM. Reason: Wrapped recipe in [code]..[/code] so indentation is preserved |
![]() |
![]() |
![]() |
#2 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jan 2015
Device: Kindle Voyage
|
Sample RSS feeds for diagnosis
Attached is a zip file with two sample RSS feeds, one that Calibre can use and one that doesn't work.
The first, "feedburner.rss" is a feed that the built-in Salon.com recipe successfully uses for tv related articles. The second "salon_news.rss" is a feed that I tried to use in a recipe but Calibre doesn't fetch any articles from it. Perhaps looking at the differences will help diagnose the problem. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,200
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you want to debug things, simply stick print statements in there and they will be output to the log. http://manual.calibre-ebook.com/news...ng-new-recipes
|
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,200
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
In this particular case you will find that your RSS feed link only works ina browser, try fetching it with curl or some other non-browser tool and it returns a binary blob. So you will need to either adjust the http request used to fetch the feeds, which you can do by re-implementing the parse_feeds() function in your recipe.
|
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,200
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Since I was somewhat curious about this, I looked into it some more and it looks like the salon.com servers are returning gzip encoded data even if the client does not ask for it. You can workaround that by adding the following to the recipe
Code:
def get_browser(self, *args, **kwargs): br = BasicNewsRecipe.get_browser(self, *args, **kwargs) br.set_handle_gzip(True) return br Last edited by kovidgoyal; 01-21-2015 at 10:09 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jan 2015
Device: Kindle Voyage
|
Thanks!
Thanks for the code to enable gzip unpacking! The recipe works now!
The only thing I wish I had now was some kind of de-duping code or option so that the same story didn't show up in multiple categories. Anyway, here's a new recipe for Salon.com reflecting the current way their feeds are structured: #!/usr/bin/env python # vim:fileencoding=utf-8 from __future__ import unicode_literals, division, absolute_import, print_function from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1421868592(BasicNewsRecipe): def get_browser(self, *args, **kwargs): br = BasicNewsRecipe.get_browser(self, *args, **kwargs) br.set_handle_gzip(True) return br title = 'Salon' oldest_article = 7 max_articles_per_feed = 100 auto_cleanup = True feeds = [ ('News', 'http://www.salon.com/category/news/feed/rss/'), ('Politics', 'http://www.salon.com/category/politics/feed/rss/'), ('Business', 'http://www.salon.com/category/business/feed/rss/'), ('Technology', 'http://www.salon.com/category/technology/feed/rss/'), ('Innovation', 'http://www.salon.com/category/innovation/feed/rss/'), ('Sustainability', 'http://www.salon.com/category/sustainability/feed/rss/'), ('Entertainment', 'http://www.salon.com/category/entertainment/feed/rss/'), ('Life', 'http://www.salon.com/category/life/feed/rss/'), ] |
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,200
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
ignore_duplicate_articles = {'title', 'url'}
|
![]() |
![]() |
![]() |
#8 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jan 2015
Device: Kindle Voyage
|
Thanks again!
Thanks for that advice. The ignore_duplicate_articles setting works beautifully.
In retrospect I should have noticed that feature myself in the API documentation, but your help here in the forums is very much appreciated nonetheless. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Make a recipe for Dutch Magazine "Groene Amsterdammer" | realbase | Recipes | 0 | 03-21-2013 07:05 PM |
Trying to make a modified version of the recipe for "The Atlantic" | camiller | Recipes | 3 | 02-14-2012 03:59 PM |
How to make a Vox-Europe Recipe | SteffenH | Recipes | 10 | 10-06-2010 04:09 AM |
Could some kind soul make a recipe for politico.com | Dragoro | Calibre | 3 | 03-13-2009 11:48 PM |
Make: HOW TO - Make PDFs for the Sony Reader (Ebook e-ink device) | kalivoodoo | Sony Reader | 6 | 02-01-2007 11:56 AM |