![]() |
#1 |
Enthusiast
![]() Posts: 33
Karma: 10
Join Date: Apr 2012
Device: Amazon Kindle Paperwhite
|
Orlando Sentinel standard recipe not getting most news feeds
The Orlando Sentinel has implemented a timer box on most of its feeds that is keeping the standard recipe from working. The Sentinel is owned by the Chicago Tribune so this is probably the same issue that Kovid fixed for the Tribune a month ago. You can see the timer box by following this link and then selecting one of the feeds: http://feeds.feedburner.com/orlandosentinel/business
Once the box has timed out, all feeds are accessible for a period of time, even after closing and reopening the browser. The standard recipe and my log file are attached. Hopefully someone can figure out how to fix this recipe. I looked at the code Kovid used to fix the Chicago Tribune but couldn’t figure out what changes were needed to fix the Sentinel. Orlando Sentinel standard recipe.txt Orlando Sentinel log.txt |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,195
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Here you go:
Code:
import urllib, re from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1279258912(BasicNewsRecipe): title = u'Orlando Sentinel' oldest_article = 3 max_articles_per_feed = 100 feeds = [ (u'News', u'http://feeds.feedburner.com/orlandosentinel/news'), (u'Opinion', u'http://feeds.feedburner.com/orlandosentinel/news/opinion'), (u'Business', u'http://feeds.feedburner.com/orlandosentinel/business'), (u'Technology', u'http://feeds.feedburner.com/orlandosentinel/technology'), (u'Space and Science', u'http://feeds.feedburner.com/orlandosentinel/news/space'), (u'Entertainment', u'http://feeds.feedburner.com/orlandosentinel/entertainment'), (u'Life and Family', u'http://feeds.feedburner.com/orlandosentinel/features/lifestyle'), ] __author__ = 'rty' pubisher = 'OrlandoSentinel.com' description = 'Orlando, Florida, Newspaper' category = 'News, Orlando, Florida' remove_javascript = True use_embedded_content = False no_stylesheets = True language = 'en' encoding = 'utf-8' conversion_options = {'linearize_tables':True} masthead_url = 'http://www.orlandosentinel.com/media/graphic/2009-07/46844851.gif' auto_cleanup = True def get_article_url(self, article): ans = None try: s = article.summary ans = urllib.unquote( re.search(r'href=".+?bookmark.cfm.+?link=(.+?)"', s).group(1)) except: pass if ans is None: link = article.get('feedburner_origlink', None) if link and link.split('/')[-1]=="story01.htm": link=link.split('/')[-2] encoding = {'0B': '.', '0C': '/', '0A': '0', '0F': '=', '0G': '&', '0D': '?', '0E': '-', '0N': '.com', '0L': 'http:', '0S':'//'} for k, v in encoding.iteritems(): link = link.replace(k, v) ans = link elif link: ans = link if ans is not None: return ans.replace('?track=rss', '') |
![]() |
![]() |
![]() |
#3 |
Enthusiast
![]() Posts: 33
Karma: 10
Join Date: Apr 2012
Device: Amazon Kindle Paperwhite
|
Thank you Kovid, the updated version you posted worked perfectly.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Duplicated news in recipe with multiple feeds | romualdinho | Recipes | 5 | 09-24-2012 09:27 PM |
Techtarget feeds recipe | julio:map | Recipes | 1 | 11-09-2011 07:42 AM |
Fairbanks Daily News-miner News Recipe Submission | rogerx | Recipes | 2 | 08-25-2011 07:30 PM |
New Fairbanks Daily News-miner News Recipe -- Need Date inclusion only | rogerx | Recipes | 5 | 08-24-2011 09:12 AM |