Quote:
Originally Posted by ax42
The page http://www.kulturinfo.ch/kino/db_front/showact.php contains a list of films. I would like this list to be the 'table of contents' of my eBook and each link to go to a page giving the film details (as happens when you click on the webpage link). I'm busy overriding parse_index to get a list of feeds but seem to be stuck between choosing one of the following two options:
a) Return a list of films, which makes each film heading a feed with one article. This seems to lead to an intermediate page between the 'table of contents' and the actual film description, with this intermediate page having just the one film on it
|
Why?? This is quite pointless.
Quote:
Originally Posted by ax42
b) Return a one-item list, with all films attached as a list of articles to this one feed. This causes an table of contents with a single entry in it. The example I've been cribbing off (The Atlantic) does this too.
|
This is the way to go since TOC will be shown with the list of articles on the reader.
A good example of what you want to accomplish can be found in several recipes I wrote.
For example recipe Vreme does exactly what you want to do. We have one page that lists all articles we want to put into feed. So I just parse them by specific condition appropriate to that page and put found data into only one feed.
Code:
def parse_index(self):
articles = []
soup = self.index_to_soup(self.INDEX)
for item in soup.findAll(['h3','h4']):
description = ''
title_prefix = ''
feed_link = item.find('a')
if feed_link and feed_link.has_key('href') and feed_link['href'].startswith('/cms/view.php'):
url = self.INDEX + feed_link['href']
title = title_prefix + self.tag_to_string(feed_link)
date = strftime(self.timefmt)
articles.append({
'title' :title
,'date' :date
,'url' :url
,'description':description
})
return [(soup.head.title.string, articles)]
In your case it would look something like this:
Code:
def parse_index(self):
articles = []
soup = self.index_to_soup('http://www.kulturinfo.ch/kino/db_front/showact.php')
for item in soup.findAll('td',attrs={'class':'title'}):
description = ''
title_prefix = ''
feed_link = item.find('a')
if feed_link and feed_link.has_key('href'):
unneeded, sep, purl = feed_link['href'].partition('..')
url = 'http://www.kulturinfo.ch/kino' + purl
title = self.tag_to_string(feed_link)
date = strftime(self.timefmt)
articles.append({
'title' :title
,'date' :date
,'url' :url
,'description':description
})
return [('Articles', articles)]