View Single Post
Old 03-17-2013, 10:52 AM   #1
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Bug in web/feeds/__init__ processing feed article date

There is a problem in parsing RSS feeds where the article dates are not being processed properly. In the routine parse_article the article date is being extracted as item.get('date_parsed', time.gmtime()) which is returning the current time because the feed/article key 'date_parsed' is not found.

In two major RSS feeds (NYTimes and Globe and Mail) the feed/article date key is 'published_parsed'. I realize there are variants of RSS feed formats but I believe (based on looking at feedparser.py) that 'published_parsed' is a valid, non-deprecated key.

As a result, any RSS feed-based recipes that use feeds with the 'published_parsed' key are not obeying oldest_article restrictions.

I'm not sure what the best fix is--probably to extract using 'published_parsed' if the 'date_parsed' key isn't there (or vice versa) to avoid breaking feeds that use 'date_parsed'.
nickredding is offline   Reply With Quote