View Single Post
Old 12-20-2010, 12:04 PM   #4
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Yet another problem with changed section types

So here goes my second issue:

The main site has sections like this:
Code:
div class='fpdocument'
     div class='section'
          ul
              li -> a -> article1
              li -> a ->article2
...
which I was able to extract via
Code:
for section in soup.findAll('div', attrs={'class':'fpdocument'}):
  # processing section_title stripped, then finding articles
  articles = []
  for post in section.findAll('li'):
    # processing articles stripped (but it just works(tm)
But now I recognized, that some section(s) has only one article, and in that case the structure is:
Code:
div class='fpdocument'
          a class='section'
          a -> article1
end div
How to extract those articles?
Thanks in advance!
hiperlink is offline   Reply With Quote