Quote:
Originally Posted by schuster
hi,
my problem today (i'm in lerning process with this stuff)
this recipe work's but in the mobi-book it shows me not the right things.
after a few hours of testing and trying i don't know the way
Spoiler:
Code:
class AdvancedUserRecipe(BasicNewsRecipe):
title = 'National_Geo_test_6'
description = '111beschreibung111'
__author__ = 'irgendwer'
publisher = 'jaja'
language = 'de'
oldest_article = 2
max_articles_per_feed = 35
no_stylesheets = True
use_embedded_content = False
remove_javascript = True
INDEX = 'http://www.nationalgeographic.de/archive/2008-05'
def parse_index(self):
articles = []
soup = self.index_to_soup(self.INDEX)
feeds = []
for section in soup.findAll('div', attrs={'class':'searchresult_text'}):
section_title = self.tag_to_string(section.find('headline-middle_no_margin black'))
articles = []
for post in section.findAll('a', href=True):
url = post['href']
if url.startswith('/'):
url = 'http://www.nationalgeographic.de'+url
title = self.tag_to_string(post)
if str(post).find('class=') > 0:
klass = post['class']
if klass != "":
self.log()
self.log('--> post: ', post)
self.log('--> url: ', url)
self.log('--> title: ', title)
self.log('--> class: ', klass)
articles.append({'title':title, 'url':url, 'section':section, 'section_title':section_title})
if articles:
feeds.append((section_title, articles))
return feeds
keep_only_tags = [dict(attrs={'class':['contentbox_no_top_border']})]
|
Kovid is right.
You are asking it to find a tag named 'headline-middle_no_margin black' when what you want is a tag named div with a class named 'headline-middle_no_margin black.' Look at your findAll on the line above the line defining section_title.
Try this:
Code:
section_title = self.tag_to_string(section.find('div', attrs={'class':'headline-middle_no_margin black'}))