Hi,
I tried to add a method to fetch multipage articles (like this:
Golem.de article RSS-feed 'Hardware') like in the 'Adventuer Gamers' recipe.
I modified the code like this to fit to the golem homepage:
Spoiler:
Code:
def append_page(self, soup, appendtag, position):
pager = soup.find('ol', attrs={'class':'list_pages'}) #class which contains the links
#to the other pages of the article
if pager:
nextpage = soup.find('a', attrs={'class':'icon-rsaquo'}) #next-page element
if nextpage:
nexturl = nextpage['href']
soup2 = self.index_to_soup(nexturl)
texttag = soup2.find('div', attrs={'class':'formatted'}) #the article text is in this
for it in texttag.findAll(style=True):
del it['style']
newpos = len(texttag.contents)
self.append_page(soup2,texttag,newpos)
texttag.extract()
pager.extract()
appendtag.insert(position,texttag)
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll('div', attrs={'class':'floatright'}):
item.extract()
self.append_page(soup, soup.body, 3)
pager = soup.find('ol',attrs={'class':'list_pages'})
if pager:
pager.extract()
return self.adeify_images(soup)
The problem is, I don't know what I have to insert
here:
Code:
...
del item['style']
for item in soup.findAll('div', attrs={'class':'floatright'}):
item.extract()
self.append_page(soup, soup.body, 3)
...
and if I have to change more things maybe...
In this form it does nothing but adding the page numbers to the end of the article...

It is probably pretty simple, but I don't know how to fix it...
Can anybody help me?
Thanks!