Quote:
Originally Posted by spedinfargo
Good find! I wonder if there's a way to add some logic to first go here to get the link:
http://sportsillustrated.cnn.com/vau...home/index.htm
It should always be the first link that looks like this:
<div id="ecomthumb_latest_11541"></div>
Is it possible to do a "two-step" process like this?
|
Yes.
Do something like:
Code:
INDEX2 = 'http://sportsillustrated.cnn.com/vault/cover/home/index.htm'
followed by changing
Code:
soup = self.index_to_soup(self.INDEX)
to
Code:
soup = self.index_to_soup(self.INDEX2)
in parse_index
Than change
Code:
cover = soup.find('div', attrs = {'alt' : 'Read All Articles', 'style' : 'vertical-align:bottom;'})
if cover:
currentIssue = cover.parent['href']
to whatever is needed to produce the currentIssue.