|
|
#1 |
|
Enthusiast
![]() Posts: 33
Karma: 10
Join Date: Apr 2012
Device: Amazon Kindle Paperwhite
|
Orlando Sentinel standard recipe not getting most news feeds
The Orlando Sentinel has implemented a timer box on most of its feeds that is keeping the standard recipe from working. The Sentinel is owned by the Chicago Tribune so this is probably the same issue that Kovid fixed for the Tribune a month ago. You can see the timer box by following this link and then selecting one of the feeds: http://feeds.feedburner.com/orlandosentinel/business
Once the box has timed out, all feeds are accessible for a period of time, even after closing and reopening the browser. The standard recipe and my log file are attached. Hopefully someone can figure out how to fix this recipe. I looked at the code Kovid used to fix the Chicago Tribune but couldn’t figure out what changes were needed to fix the Sentinel. Orlando Sentinel standard recipe.txt Orlando Sentinel log.txt |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Here you go:
Code:
import urllib, re
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1279258912(BasicNewsRecipe):
title = u'Orlando Sentinel'
oldest_article = 3
max_articles_per_feed = 100
feeds = [
(u'News', u'http://feeds.feedburner.com/orlandosentinel/news'),
(u'Opinion', u'http://feeds.feedburner.com/orlandosentinel/news/opinion'),
(u'Business', u'http://feeds.feedburner.com/orlandosentinel/business'),
(u'Technology', u'http://feeds.feedburner.com/orlandosentinel/technology'),
(u'Space and Science', u'http://feeds.feedburner.com/orlandosentinel/news/space'),
(u'Entertainment', u'http://feeds.feedburner.com/orlandosentinel/entertainment'),
(u'Life and Family', u'http://feeds.feedburner.com/orlandosentinel/features/lifestyle'),
]
__author__ = 'rty'
pubisher = 'OrlandoSentinel.com'
description = 'Orlando, Florida, Newspaper'
category = 'News, Orlando, Florida'
remove_javascript = True
use_embedded_content = False
no_stylesheets = True
language = 'en'
encoding = 'utf-8'
conversion_options = {'linearize_tables':True}
masthead_url = 'http://www.orlandosentinel.com/media/graphic/2009-07/46844851.gif'
auto_cleanup = True
def get_article_url(self, article):
ans = None
try:
s = article.summary
ans = urllib.unquote(
re.search(r'href=".+?bookmark.cfm.+?link=(.+?)"', s).group(1))
except:
pass
if ans is None:
link = article.get('feedburner_origlink', None)
if link and link.split('/')[-1]=="story01.htm":
link=link.split('/')[-2]
encoding = {'0B': '.', '0C': '/', '0A': '0', '0F': '=', '0G': '&',
'0D': '?', '0E': '-', '0N': '.com', '0L': 'http:',
'0S':'//'}
for k, v in encoding.iteritems():
link = link.replace(k, v)
ans = link
elif link:
ans = link
if ans is not None:
return ans.replace('?track=rss', '')
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Enthusiast
![]() Posts: 33
Karma: 10
Join Date: Apr 2012
Device: Amazon Kindle Paperwhite
|
Thank you Kovid, the updated version you posted worked perfectly.
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Duplicated news in recipe with multiple feeds | romualdinho | Recipes | 5 | 09-24-2012 09:27 PM |
| Techtarget feeds recipe | julio:map | Recipes | 1 | 11-09-2011 07:42 AM |
| Fairbanks Daily News-miner News Recipe Submission | rogerx | Recipes | 2 | 08-25-2011 07:30 PM |
| New Fairbanks Daily News-miner News Recipe -- Need Date inclusion only | rogerx | Recipes | 5 | 08-24-2011 09:12 AM |