Using PubDate in print_version of custom news source

mobilereader72 · 05-30-2009, 02:48 PM

Hi,

I'm rather new to this, but is there an easy way to use the pubdate information for the print_version of a custom news source?

For example, the article link is at:

Code:

http://www.somewebsite.com/.../article_idnumber

But print version is at:

Code:

http://www.somewebsite.com/.../print/20090529/idnumber

The 20090529 is the pubdate identified in the article link above. Whereas the pubdate is listed in the xml source as

Code:

<pubDate>Fri, 29 May 2009 23:31 -0400</pubDate>

Calibre seems to be able to parse the pubdate fine using the BasicNewsRecipe. So is there some sort of global variable I can use to include the pubdate in my print_version url?

For example:

Code:

def print_version(self, url): 
   return 'http://www.somewebsite.com/../print/' + pubdate + '/' + url.rsplit('/article_')[1]

From what I've read in the documentation, I will need to parse the feeds again using the parse_feeds() function in order to extract the pubdate data. Is this correct? Does anyone have any examples on how do do this? I can't seem to find any recipes that use the parse_feeds() function.

Any help would be appreciated. Thanks!

kovidgoyal · 05-30-2009, 02:50 PM

Rather than using pubdate (which may not always work) you can simply fetch the non print url in the print_version method, using the index_to_soup method, and extract the URL of the print version from that.

mobilereader72 · 05-30-2009, 03:07 PM

Thanks, but the print version link that I'm trying to use is not included on the non print web page. I know it sounds odd, but the non-print web page has changed in that they use Java now to generate the printed version. I discovered an alternative method of creating a clean, text only version, but this alternative method uses the pubdate as part of the URL.

Rather than rewrite my entire recipe, I was wondering if there was a quick and easy way to just change my print version URL so that it includes the pubdate.

kovidgoyal · 05-30-2009, 03:28 PM

You'd basically have to re-implement parse_feeds in your recipe, you can just copy paste it from BasicNewsRecipe and change it a little to extract the pubdate from the feed

kiklop74 · 05-30-2009, 05:52 PM

I would recommend you to use in this case method get_article_url instead of print_version. In get_article_url you have access to the xml and you can override all article entries with print version:

Code:

    def get_article_url(self, article):
        raw_url = article.get('link',  None)
        date_url = article.get('pubDate',  None)
        #Extract values from date_url
        #we assume you have the final version of date string in datestr variable
        datestr = "<processed valid value>"
        art_id = raw_url.rsplit('/article_')[1]
        nurl = raw_url.replace('http://www.somewebsite.com/','http://www.somewebsite.com/print/' + datestr + '/' + art_id)
        return nurl

With such code you do not need print_version at all

Examples of this you can see at various recipes like La prensa, NIN etc.

05-30-2009, 02:48 PM	#1
mobilereader72 Junior Member Posts: 2 Karma: 10 Join Date: May 2009 Device: multiple	Using PubDate in print_version of custom news source Hi, I'm rather new to this, but is there an easy way to use the pubdate information for the print_version of a custom news source? For example, the article link is at: Code: http://www.somewebsite.com/.../article_idnumber But print version is at: Code: http://www.somewebsite.com/.../print/20090529/idnumber The 20090529 is the pubdate identified in the article link above. Whereas the pubdate is listed in the xml source as Code: <pubDate>Fri, 29 May 2009 23:31 -0400</pubDate> Calibre seems to be able to parse the pubdate fine using the BasicNewsRecipe. So is there some sort of global variable I can use to include the pubdate in my print_version url? For example: Code: def print_version(self, url): return 'http://www.somewebsite.com/../print/' + pubdate + '/' + url.rsplit('/article_')[1] From what I've read in the documentation, I will need to parse the feeds again using the parse_feeds() function in order to extract the pubdate data. Is this correct? Does anyone have any examples on how do do this? I can't seem to find any recipes that use the parse_feeds() function. Any help would be appreciated. Thanks!

05-30-2009, 05:52 PM	#5
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	I would recommend you to use in this case method get_article_url instead of print_version. In get_article_url you have access to the xml and you can override all article entries with print version: Code: def get_article_url(self, article): raw_url = article.get('link', None) date_url = article.get('pubDate', None) #Extract values from date_url #we assume you have the final version of date string in datestr variable datestr = "<processed valid value>" art_id = raw_url.rsplit('/article_')[1] nurl = raw_url.replace('http://www.somewebsite.com/','http://www.somewebsite.com/print/' + datestr + '/' + art_id) return nurl With such code you do not need print_version at all Examples of this you can see at various recipes like La prensa, NIN etc.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Best English News Source?	Gideon	Reading Recommendations	24	11-16-2010 05:14 PM
Rename output Title of (custom) news source	ischeriad	Calibre	4	02-16-2010 06:14 AM
Custom news source - for forums	RichD	Calibre	0	01-12-2010 11:05 AM
Custom news source	JayCeeEll	Calibre	2	11-14-2009 04:01 AM
libprs500 and custom news feeds	scottsan	Calibre	1	04-03-2008 02:49 PM

05-30-2009, 02:50 PM	#2
kovidgoyal creator of calibre Posts: 44,321 Karma: 23661992 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Rather than using pubdate (which may not always work) you can simply fetch the non print url in the print_version method, using the index_to_soup method, and extract the URL of the print version from that.

05-30-2009, 03:07 PM	#3
mobilereader72 Junior Member Posts: 2 Karma: 10 Join Date: May 2009 Device: multiple	Thanks, but the print version link that I'm trying to use is not included on the non print web page. I know it sounds odd, but the non-print web page has changed in that they use Java now to generate the printed version. I discovered an alternative method of creating a clean, text only version, but this alternative method uses the pubdate as part of the URL. Rather than rewrite my entire recipe, I was wondering if there was a quick and easy way to just change my print version URL so that it includes the pubdate.

05-30-2009, 03:28 PM	#4
kovidgoyal creator of calibre Posts: 44,321 Karma: 23661992 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You'd basically have to re-implement parse_feeds in your recipe, you can just copy paste it from BasicNewsRecipe and change it a little to extract the pubdate from the feed

Advert

Advert