Quote:
Originally Posted by Nexus
|
Correct.
Quote:
someone is kind enough to give me a hint
|
Look at some samples. These all use parse_index:
Code:
DrawAndCook.recipe' :
akter.recipe' :
atlantic.recipe' :
auto_prove.recipe' :
axxon_magazine.recipe' :
billorielly.recipe' :
borba.recipe' :
brand_eins.recipe' :
businessworldin.recipe' :
bwmagazine.recipe' :
calgary_herald.recipe' :
comics_com.recipe' :
cynewslive.recipe' :
cyprus_weekly.recipe' :
dani.recipe' :
daum_net.recipe' :
deredactie.recipe' :
economist.recipe' :
economist_free.recipe' :
edmonton_journal.recipe' :
el_cultural.recipe' :
elpais_impreso.recipe' :
elpais_semanal.recipe' :
eluniversalimpresa.recipe' :
entrepeneur.recipe' :
financial_times_uk.recipe' :
fokkeensukke.recipe' :
foreignaffairs.recipe' :
fstream.recipe' :
glas_srpske.recipe' :
go_comics.recipe' :
guardian.recipe' :
haaretz_en.recipe' :
harpers_full.recipe' :
hbr.recipe' :
hbr_blogs.recipe' :
hindu.recipe' :
houston_chronicle.recipe' :
ieeespectrum.recipe' :
inc.recipe' :
india_today.recipe' :
instapaper.recipe' :
johm.recipe' :
joop.recipe' :
kellog_faculty.recipe' :
kidney.recipe' :
lamujerdemivida.recipe' :
laprensa_ni.recipe' :
lemonde_dip.recipe' :
lenta_ru.recipe' :
losservatoreromano_it.recipe' :
lrb_payed.recipe' :
macleans.recipe' :
malaysian_mirror.recipe' :
milenio.recipe' :
ming_pao.recipe' :
monitor.recipe' :
montreal_gazette.recipe' :
national_post.recipe' :
ncrnext.recipe' :
nejm.recipe' :
new_york_review_of_books.recipe' :
new_york_review_of_books_no_sub.recipe' :
newsweek.recipe' :
newsweek_polska.recipe' :
nin.recipe' :
nymag.recipe' :
Since I wrote it, and it's first on the list, let's look at the relevant parts of DrawandCook
Code:
def parse_index(self):
feeds = []
for title, url in [
("They Draw and Cook", "http://www.theydrawandcook.com/")
]:
articles = self.make_links(url)
if articles:
feeds.append((title, articles))
print 'feeds are: ', feeds
return feeds
def make_links(self, url):
soup = self.index_to_soup(url)
title = ''
date = ''
current_articles = []
soup = self.index_to_soup(url)
recipes = soup.findAll('div', attrs={'class': 'date-outer'})
for recipe in recipes:
title = recipe.h3.a.string
page_url = recipe.h3.a['href']
current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':date})
return current_articles
The parse_index method needs to return a feed and a list of articles for that feed. The structure above is set up for multiple feeds, but only does a single one, and that's what you want to do, too (unless you want to build multiple feeds).
The hard part is the list of articles, and that's done in make_links. You need to find a title and a url for each article. The date and description can be left blank, or filled in, as you prefer.
You can find the url and title for each article on your page (
http://tsn.ca/nhl/story/?id=nhl). Just modify the Feed title and url of your page in parse_feeds, then modify make_links so that the findAll finds all your links, and the for loop finds the title and page_url for each.
Simple.