MobileRead Forums - View Single Post - Recipes for RDS.ca, TSN.ca and TheHockeynews.com

Starson17 · 11-19-2010, 04:24 PM

Quote:

Originally Posted by Starson17

Here's a start:

I had a few minutes to finish parse_index:

Code:

    INDEX = 'http://tsn.ca/nhl/story/?id=nhl'    

    def parse_index(self):
        feeds = []
        soup = self.index_to_soup(self.INDEX)
        feed_parts = soup.findAll('div', attrs={'class': 'feature'})
        for feed_part  in feed_parts:
            articles = []
            if not feed_part.h2:
                continue
            feed_title = feed_part.h2.string
            article_parts = feed_part.findAll('a')
            for article_part in article_parts:
                article_title = article_part.string
                article_date = ''
                article_url = 'http://tsn.ca/' + article_part['href']
                articles.append({'title': article_title, 'url': article_url, 'description':'', 'date':article_date})
            if articles:
                feeds.append((feed_title, articles))
        return feeds

All you need to do now is remove the junk.