Thread: web2lrf
View Single Post
Old 05-22-2008, 01:31 AM   #324
Ben_B
Junior Member
Ben_B began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Apr 2008
Location: British Columbia, Canada
Device: Sony PRS-505
As for the links to the full stories from the Globe and Mail, I was using the following function to retrieve the full stories from the Globe Investor web site in the profile I posted earlier. The Globe Investor produces a very nice printed version without any extra HTML. I was using the function to created printed versions of the news stories from the Globe and Mail RSS feeds (i.e., http://www.theglobeandmail.com/gener...s/BN/Front.xml).

def print_version(self, url):
return 'http://www.globeinvestor.com/servlet/ArticleNews/print/' + (url.split('/story/',1)[1]).split('.',1)[0] + '/' + url.rsplit('.',3)[2] + '/' + url.rsplit('.',3)[3]

The problem I ran into is that most of the full stories are contained within the tag <feedburnerrigLink>. With the old libprs500, I was usng url_search_order = ['feedburnerriglink']. This seemed to work; however, this variable no longer seems to exist in Calibre's Basic News Recipe. I can't seem to figure out how to make Calibre follow the links contained within the <feedburnerrigLink> tags. I'm guessing I will need to process this somehow through another function?
Ben_B is offline   Reply With Quote