05-30-2009, 02:48 PM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: May 2009
Device: multiple
|
Using PubDate in print_version of custom news source
Hi,
I'm rather new to this, but is there an easy way to use the pubdate information for the print_version of a custom news source? For example, the article link is at: Code:
http://www.somewebsite.com/.../article_idnumber Code:
http://www.somewebsite.com/.../print/20090529/idnumber Code:
<pubDate>Fri, 29 May 2009 23:31 -0400</pubDate> For example: Code:
def print_version(self, url): return 'http://www.somewebsite.com/../print/' + pubdate + '/' + url.rsplit('/article_')[1] Any help would be appreciated. Thanks! |
05-30-2009, 02:50 PM | #2 |
creator of calibre
Posts: 44,321
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Rather than using pubdate (which may not always work) you can simply fetch the non print url in the print_version method, using the index_to_soup method, and extract the URL of the print version from that.
|
Advert | |
|
05-30-2009, 03:07 PM | #3 |
Junior Member
Posts: 2
Karma: 10
Join Date: May 2009
Device: multiple
|
Thanks, but the print version link that I'm trying to use is not included on the non print web page. I know it sounds odd, but the non-print web page has changed in that they use Java now to generate the printed version. I discovered an alternative method of creating a clean, text only version, but this alternative method uses the pubdate as part of the URL.
Rather than rewrite my entire recipe, I was wondering if there was a quick and easy way to just change my print version URL so that it includes the pubdate. |
05-30-2009, 03:28 PM | #4 |
creator of calibre
Posts: 44,321
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You'd basically have to re-implement parse_feeds in your recipe, you can just copy paste it from BasicNewsRecipe and change it a little to extract the pubdate from the feed
|
05-30-2009, 05:52 PM | #5 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
I would recommend you to use in this case method get_article_url instead of print_version. In get_article_url you have access to the xml and you can override all article entries with print version:
Code:
def get_article_url(self, article): raw_url = article.get('link', None) date_url = article.get('pubDate', None) #Extract values from date_url #we assume you have the final version of date string in datestr variable datestr = "<processed valid value>" art_id = raw_url.rsplit('/article_')[1] nurl = raw_url.replace('http://www.somewebsite.com/','http://www.somewebsite.com/print/' + datestr + '/' + art_id) return nurl Examples of this you can see at various recipes like La prensa, NIN etc. |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Best English News Source? | Gideon | Reading Recommendations | 24 | 11-16-2010 05:14 PM |
Rename output Title of (custom) news source | ischeriad | Calibre | 4 | 02-16-2010 06:14 AM |
Custom news source - for forums | RichD | Calibre | 0 | 01-12-2010 11:05 AM |
Custom news source | JayCeeEll | Calibre | 2 | 11-14-2009 04:01 AM |
libprs500 and custom news feeds | scottsan | Calibre | 1 | 04-03-2008 02:49 PM |