View Single Post
Old 11-19-2010, 10:21 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Nexus View Post
Starting from this page (http://tsn.ca/nhl/story/?id=nhl), I understand I have to use the parse_index command in my recipe,
Correct.

Quote:
someone is kind enough to give me a hint
Look at some samples. These all use parse_index:
Code:
DrawAndCook.recipe' :
akter.recipe' :
atlantic.recipe' :
auto_prove.recipe' :
axxon_magazine.recipe' :
billorielly.recipe' :
borba.recipe' :
brand_eins.recipe' :
businessworldin.recipe' :
bwmagazine.recipe' :
calgary_herald.recipe' :
comics_com.recipe' :
cynewslive.recipe' :
cyprus_weekly.recipe' :
dani.recipe' :
daum_net.recipe' :
deredactie.recipe' :
economist.recipe' :
economist_free.recipe' :
edmonton_journal.recipe' :
el_cultural.recipe' :
elpais_impreso.recipe' :
elpais_semanal.recipe' :
eluniversalimpresa.recipe' :
entrepeneur.recipe' :
financial_times_uk.recipe' :
fokkeensukke.recipe' :
foreignaffairs.recipe' :
fstream.recipe' :
glas_srpske.recipe' :
go_comics.recipe' :
guardian.recipe' :
haaretz_en.recipe' :
harpers_full.recipe' :
hbr.recipe' :
hbr_blogs.recipe' :
hindu.recipe' :
houston_chronicle.recipe' :
ieeespectrum.recipe' :
inc.recipe' :
india_today.recipe' :
instapaper.recipe' :
johm.recipe' :
joop.recipe' :
kellog_faculty.recipe' :
kidney.recipe' :
lamujerdemivida.recipe' :
laprensa_ni.recipe' :
lemonde_dip.recipe' :
lenta_ru.recipe' :
losservatoreromano_it.recipe' :
lrb_payed.recipe' :
macleans.recipe' :
malaysian_mirror.recipe' :
milenio.recipe' :
ming_pao.recipe' :
monitor.recipe' :
montreal_gazette.recipe' :
national_post.recipe' :
ncrnext.recipe' :
nejm.recipe' :
new_york_review_of_books.recipe' :
new_york_review_of_books_no_sub.recipe' :
newsweek.recipe' :
newsweek_polska.recipe' :
nin.recipe' :
nymag.recipe' :
Since I wrote it, and it's first on the list, let's look at the relevant parts of DrawandCook
Code:
    def parse_index(self):
        feeds = []
        for title, url in [
                            ("They Draw and Cook", "http://www.theydrawandcook.com/")
                            ]:
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        print 'feeds are: ', feeds
        return feeds

    def make_links(self, url):
        soup = self.index_to_soup(url)
        title = ''
        date = ''
        current_articles = []
        soup = self.index_to_soup(url)
        recipes = soup.findAll('div', attrs={'class': 'date-outer'})
        for recipe in recipes:
            title = recipe.h3.a.string
            page_url = recipe.h3.a['href']
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':date})
        return current_articles
The parse_index method needs to return a feed and a list of articles for that feed. The structure above is set up for multiple feeds, but only does a single one, and that's what you want to do, too (unless you want to build multiple feeds).
The hard part is the list of articles, and that's done in make_links. You need to find a title and a url for each article. The date and description can be left blank, or filled in, as you prefer.

You can find the url and title for each article on your page (http://tsn.ca/nhl/story/?id=nhl). Just modify the Feed title and url of your page in parse_feeds, then modify make_links so that the findAll finds all your links, and the for loop finds the title and page_url for each.

Simple.

Last edited by Starson17; 11-19-2010 at 10:39 AM.
Starson17 is offline   Reply With Quote