View Single Post
Old 04-18-2009, 08:35 PM   #461
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by ax42 View Post
The page http://www.kulturinfo.ch/kino/db_front/showact.php contains a list of films. I would like this list to be the 'table of contents' of my eBook and each link to go to a page giving the film details (as happens when you click on the webpage link). I'm busy overriding parse_index to get a list of feeds but seem to be stuck between choosing one of the following two options:

a) Return a list of films, which makes each film heading a feed with one article. This seems to lead to an intermediate page between the 'table of contents' and the actual film description, with this intermediate page having just the one film on it
Why?? This is quite pointless.

Quote:
Originally Posted by ax42 View Post
b) Return a one-item list, with all films attached as a list of articles to this one feed. This causes an table of contents with a single entry in it. The example I've been cribbing off (The Atlantic) does this too.
This is the way to go since TOC will be shown with the list of articles on the reader.

A good example of what you want to accomplish can be found in several recipes I wrote.

For example recipe Vreme does exactly what you want to do. We have one page that lists all articles we want to put into feed. So I just parse them by specific condition appropriate to that page and put found data into only one feed.

Code:
    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)        
        for item in soup.findAll(['h3','h4']):
            description = ''
            title_prefix = ''
            feed_link = item.find('a')
            if feed_link and feed_link.has_key('href') and feed_link['href'].startswith('/cms/view.php'):
                url   = self.INDEX + feed_link['href']
                title = title_prefix + self.tag_to_string(feed_link)
                date  = strftime(self.timefmt)                
                articles.append({
                                  'title'      :title
                                 ,'date'       :date
                                 ,'url'        :url
                                 ,'description':description
                                })
        return [(soup.head.title.string, articles)]

In your case it would look something like this:

Code:
    def parse_index(self):
        articles = []
        soup = self.index_to_soup('http://www.kulturinfo.ch/kino/db_front/showact.php')
        
        for item in soup.findAll('td',attrs={'class':'title'}):
            description = ''
            title_prefix = ''
            feed_link = item.find('a')
            if feed_link and feed_link.has_key('href'):
                unneeded, sep, purl = feed_link['href'].partition('..')
                url   = 'http://www.kulturinfo.ch/kino' + purl
                title = self.tag_to_string(feed_link)
                date  = strftime(self.timefmt)                
                articles.append({
                                  'title'      :title
                                 ,'date'       :date
                                 ,'url'        :url
                                 ,'description':description
                                })
        return [('Articles', articles)]
kiklop74 is offline