View Single Post
Old 02-18-2009, 06:46 PM   #244
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Ars Technica Now Fetching Entire Article! Super!

Quote:
Originally Posted by kiklop74 View Post
Updated recipe Ars technica with multipage news support
kiklop74,

Your latest revised Ars Technica recipe seems to be working fine. Thanks a million.

I guess this segment of your code is what fetches articles continued across multiple pages:

Code:
def append_page(self, soup, appendtag, position):
        pager = soup.find('div',attrs={'id':'pager'})
        if pager:           
           for atag in pager.findAll('a',href=True):
               str = self.tag_to_string(atag)
               if str.startswith('Next'):
                  soup2 = self.index_to_soup(atag['href'])
                  texttag = soup2.find('div', attrs={'class':'news-item-text'})
                  for it in texttag.findAll(style=True):
                      del it['style']
                  newpos = len(texttag.contents)          
                  self.append_page(soup2,texttag,newpos)
                  texttag.extract()
                  pager.extract()
                  appendtag.insert(position,texttag)
Again, thanks.

Xanthan Gum

Last edited by XanthanGum; 02-18-2009 at 06:49 PM. Reason: To correct code entry
XanthanGum is offline