MobileRead Forums - View Single Post - Recipes for RDS.ca, TSN.ca and TheHockeynews.com

Starson17 · 11-19-2010, 10:21 AM

Quote:

Originally Posted by Nexus

Starting from this page (http://tsn.ca/nhl/story/?id=nhl), I understand I have to use the parse_index command in my recipe,

Correct.

Quote:

someone is kind enough to give me a hint

Look at some samples. These all use parse_index:

Code:

DrawAndCook.recipe' :
akter.recipe' :
atlantic.recipe' :
auto_prove.recipe' :
axxon_magazine.recipe' :
billorielly.recipe' :
borba.recipe' :
brand_eins.recipe' :
businessworldin.recipe' :
bwmagazine.recipe' :
calgary_herald.recipe' :
comics_com.recipe' :
cynewslive.recipe' :
cyprus_weekly.recipe' :
dani.recipe' :
daum_net.recipe' :
deredactie.recipe' :
economist.recipe' :
economist_free.recipe' :
edmonton_journal.recipe' :
el_cultural.recipe' :
elpais_impreso.recipe' :
elpais_semanal.recipe' :
eluniversalimpresa.recipe' :
entrepeneur.recipe' :
financial_times_uk.recipe' :
fokkeensukke.recipe' :
foreignaffairs.recipe' :
fstream.recipe' :
glas_srpske.recipe' :
go_comics.recipe' :
guardian.recipe' :
haaretz_en.recipe' :
harpers_full.recipe' :
hbr.recipe' :
hbr_blogs.recipe' :
hindu.recipe' :
houston_chronicle.recipe' :
ieeespectrum.recipe' :
inc.recipe' :
india_today.recipe' :
instapaper.recipe' :
johm.recipe' :
joop.recipe' :
kellog_faculty.recipe' :
kidney.recipe' :
lamujerdemivida.recipe' :
laprensa_ni.recipe' :
lemonde_dip.recipe' :
lenta_ru.recipe' :
losservatoreromano_it.recipe' :
lrb_payed.recipe' :
macleans.recipe' :
malaysian_mirror.recipe' :
milenio.recipe' :
ming_pao.recipe' :
monitor.recipe' :
montreal_gazette.recipe' :
national_post.recipe' :
ncrnext.recipe' :
nejm.recipe' :
new_york_review_of_books.recipe' :
new_york_review_of_books_no_sub.recipe' :
newsweek.recipe' :
newsweek_polska.recipe' :
nin.recipe' :
nymag.recipe' :

Since I wrote it, and it's first on the list, let's look at the relevant parts of DrawandCook

Code:

    def parse_index(self):
        feeds = []
        for title, url in [
                            ("They Draw and Cook", "http://www.theydrawandcook.com/")
                            ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        print 'feeds are: ', feeds
        return feeds

    def make_links(self, url):
        soup = self.index_to_soup(url)
        title = ''
        date = ''
        current_articles = []
        soup = self.index_to_soup(url)
        recipes = soup.findAll('div', attrs={'class': 'date-outer'})
        for recipe in recipes:
            title = recipe.h3.a.string
            page_url = recipe.h3.a['href']
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':date})
        return current_articles

The parse_index method needs to return a feed and a list of articles for that feed. The structure above is set up for multiple feeds, but only does a single one, and that's what you want to do, too (unless you want to build multiple feeds).
The hard part is the list of articles, and that's done in make_links. You need to find a title and a url for each article. The date and description can be left blank, or filled in, as you prefer.

You can find the url and title for each article on your page (http://tsn.ca/nhl/story/?id=nhl). Just modify the Feed title and url of your page in parse_feeds, then modify make_links so that the findAll finds all your links, and the for loop finds the title and page_url for each.

Simple.