View Single Post
Old 10-28-2013, 05:44 PM   #1
lucis_lupinum
Member
lucis_lupinum began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Oct 2013
Device: Kindle
Golem.de (german tech news) multipage article

Hi,

I tried to add a method to fetch multipage articles (like this: Golem.de article RSS-feed 'Hardware') like in the 'Adventuer Gamers' recipe.

I modified the code like this to fit to the golem homepage:

Spoiler:
Code:
def append_page(self, soup, appendtag, position):
      pager = soup.find('ol', attrs={'class':'list_pages'}) #class which contains the links
                                                           #to the other pages of the article
      if pager:
         nextpage = soup.find('a', attrs={'class':'icon-rsaquo'}) #next-page element
         if nextpage:
             nexturl = nextpage['href']
             soup2 = self.index_to_soup(nexturl)
             texttag = soup2.find('div', attrs={'class':'formatted'}) #the article text is in this
             for it in texttag.findAll(style=True):
                 del it['style']
             newpos = len(texttag.contents)
             self.append_page(soup2,texttag,newpos)
             texttag.extract()
             pager.extract()
             appendtag.insert(position,texttag)


  def preprocess_html(self, soup):
      for item in soup.findAll(style=True):
          del item['style']
      for item in soup.findAll('div', attrs={'class':'floatright'}):
          item.extract()
      self.append_page(soup, soup.body, 3)
      pager = soup.find('ol',attrs={'class':'list_pages'})
      if pager:
         pager.extract()
      return self.adeify_images(soup)


The problem is, I don't know what I have to insert here:
Code:
...
          del item['style']
      for item in soup.findAll('div', attrs={'class':'floatright'}):
          item.extract()
      self.append_page(soup, soup.body, 3)
...
and if I have to change more things maybe...

In this form it does nothing but adding the page numbers to the end of the article...
It is probably pretty simple, but I don't know how to fix it...


Can anybody help me?
Thanks!
lucis_lupinum is offline   Reply With Quote