Bug in web/feeds/__init__ processing feed article date
There is a problem in parsing RSS feeds where the article dates are not being processed properly. In the routine parse_article the article date is being extracted as item.get('date_parsed', time.gmtime()) which is returning the current time because the feed/article key 'date_parsed' is not found.
In two major RSS feeds (NYTimes and Globe and Mail) the feed/article date key is 'published_parsed'. I realize there are variants of RSS feed formats but I believe (based on looking at feedparser.py) that 'published_parsed' is a valid, non-deprecated key.
As a result, any RSS feed-based recipes that use feeds with the 'published_parsed' key are not obeying oldest_article restrictions.
I'm not sure what the best fix is--probably to extract using 'published_parsed' if the 'date_parsed' key isn't there (or vice versa) to avoid breaking feeds that use 'date_parsed'.
|