Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-17-2013, 10:52 AM   #1
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Bug in web/feeds/__init__ processing feed article date

There is a problem in parsing RSS feeds where the article dates are not being processed properly. In the routine parse_article the article date is being extracted as item.get('date_parsed', time.gmtime()) which is returning the current time because the feed/article key 'date_parsed' is not found.

In two major RSS feeds (NYTimes and Globe and Mail) the feed/article date key is 'published_parsed'. I realize there are variants of RSS feed formats but I believe (based on looking at feedparser.py) that 'published_parsed' is a valid, non-deprecated key.

As a result, any RSS feed-based recipes that use feeds with the 'published_parsed' key are not obeying oldest_article restrictions.

I'm not sure what the best fix is--probably to extract using 'published_parsed' if the 'date_parsed' key isn't there (or vice versa) to avoid breaking feeds that use 'date_parsed'.
nickredding is offline   Reply With Quote
Old 03-17-2013, 06:51 PM   #2
jumafl
Enthusiast
jumafl began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Apr 2012
Device: Amazon Kindle Paperwhite
Could this be the same issue that is causing the oldest article function not to skip old articles in basic recipes? The issue started with Calibre 0.9.23.

For example, I have a basic recipe to pull RSS feeds for betanews.com. The fields in this basic recipe are as follows (to recreate this issue you will need to create a basic recipe and enter these fields).

class BasicUserRecipe1363558652(AutomaticNewsRecipe):
title = u'Beta News'
oldest_article = 1
max_articles_per_feed = 100
auto_cleanup = True
feeds = [(u'Top Stories', u'http://feeds.betanews.com/bn')]

Using Calibre 0.9.21 this basic recipe correctly returns only articles posted in the last 1 day. Using Calibre 0.9.23, this recipe returns all articles on "http://feeds.betanews.com/bn" page. The oldest is currently 4 days old.

The same issue is happening with my other basic recipes. All were running correctly until I installed 0.9.23.
jumafl is offline   Reply With Quote
Advert
Old 03-17-2013, 06:59 PM   #3
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
^I don't know when this problem arose but yes, quite likely this is why.
nickredding is offline   Reply With Quote
Old 03-17-2013, 09:47 PM   #4
jumafl
Enthusiast
jumafl began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Apr 2012
Device: Amazon Kindle Paperwhite
I can't say it is related but the Calibre 0.9.23 change log shows this bug fix:

News download: Update the library used to parse RSS feeds.
Closes tickets: 1152852
jumafl is offline   Reply With Quote
Old 03-18-2013, 01:48 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
http://bazaar.launchpad.net/~kovid/c...revision/14619
kovidgoyal is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bug in Kobo processing of epub files causing hang in "Processing content" BensonBear Kobo Reader 21 12-21-2012 05:47 AM
Help: When Article is Feed? _reader Recipes 2 06-14-2012 03:53 PM
RSS feed with date in url entodoays Recipes 0 10-22-2011 04:24 PM
Create Article Sections From Feed? Finbar127 Recipes 5 02-26-2011 08:55 AM
Web Standards for E-books by Joe Clark (web article) guyanonymous General Discussions 2 03-18-2010 10:36 PM


All times are GMT -4. The time now is 04:47 PM.


MobileRead.com is a privately owned, operated and funded community.