View Single Post
Old 06-06-2011, 03:51 PM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by schuster View Post
hi,
my problem today (i'm in lerning process with this stuff)
this recipe work's but in the mobi-book it shows me not the right things.
after a few hours of testing and trying i don't know the way
Spoiler:
Code:
class AdvancedUserRecipe(BasicNewsRecipe):

    title = 'National_Geo_test_6'
    description = '111beschreibung111'
    __author__ = 'irgendwer'
    publisher = 'jaja'
    language = 'de'
    oldest_article = 2
    max_articles_per_feed = 35
    no_stylesheets         = True
    use_embedded_content   = False
    remove_javascript      = True
    INDEX = 'http://www.nationalgeographic.de/archive/2008-05'
    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)
        feeds = []
        for section in soup.findAll('div', attrs={'class':'searchresult_text'}):
            section_title = self.tag_to_string(section.find('headline-middle_no_margin black'))
            articles = []
            for post in section.findAll('a', href=True):
                url = post['href']
                if url.startswith('/'):
                  url = 'http://www.nationalgeographic.de'+url
                  title = self.tag_to_string(post)
                  if str(post).find('class=') > 0:
                    klass = post['class']
                    if klass != "":
                      self.log()
                      self.log('--> post:  ', post)
                      self.log('--> url:   ', url)
                      self.log('--> title: ', title)
                      self.log('--> class: ', klass)
                      articles.append({'title':title, 'url':url, 'section':section, 'section_title':section_title})
            if articles:
                feeds.append((section_title, articles))
        return feeds

    keep_only_tags = [dict(attrs={'class':['contentbox_no_top_border']})]
Kovid is right.
You are asking it to find a tag named 'headline-middle_no_margin black' when what you want is a tag named div with a class named 'headline-middle_no_margin black.' Look at your findAll on the line above the line defining section_title.
Try this:
Code:
section_title = self.tag_to_string(section.find('div', attrs={'class':'headline-middle_no_margin black'}))
Starson17 is offline   Reply With Quote