I'm having trouble getting the print versions of articles from the Orlando Sentinel. The problem is that they have completely different article numbers for the regular and print-friendly versions of a feature.
For instance:
In this RSS feed:
http://feeds.feedburner.com/orlandosentinel
Regular version with the link provided in RSS:
http://www.orlandosentinel.com/business/orl-existing-home-sales-orlando-100908,0,2581414.story
Print-friendly version (link is found on regular article's page):
http://www.orlandosentinel.com/business/orl-existing-home-sales-orlando-100908,0,95752,print.story
The print-friendly version shows up like this in the regular version:
Code:
<div><img src="/common/images/icons/atools-printer.gif" alt="Print" /><a href="/business/orl-existing-home-sales-orlando-100908,0,95752,print.story" rel="nofollow" >Print</a></div>
What would be the best way to get the printable versions instead of the regular articles?
I already tried this but I think it's just looking at the actual RSS feed instead of each article so it did not help.
Code:
def print_version(self, url):
soup = self.index_to_soup(url)
for item in soup.findAll('a', attrs={'rel':'nofollow'}):
strhref = item['href']
match = strhref.find('print.story')
if match > -1:
return strhref
return None
Thanks in advance for any help you can provide.