View Single Post
Old 02-29-2012, 05:27 PM   #1
julio:map
Member
julio:map began at the beginning.
 
Posts: 23
Karma: 12
Join Date: Jul 2011
Device: Cool-er
Pages not showing up

I'm trying to build a recipe for http://www.colectivoburbuja.org/?feed=rss2

Things seemed very easy, because the pages are very "clean"...

All the data that I need is under de DIV ID=main

I explicitly set auto_cleanup to FALSE, and no_stylesheets to TRUE in order to avoid the page not showing up.

I am sure that HTML code is being retrieved (for each article downloaded) because I have made a small "trick" in my recipe to print it (for debuging purposes)...

... but ALL of the pages retrieved are BLANK.

Please can somebody help me understand why?

Code:
class AdvancedUserRecipe1330197191(BasicNewsRecipe):
    title          = u'Colectivo Burbuja'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = False

    no_stylesheets = True

    feeds          = [(u'Colectivo Burbuja', u'http://www.colectivoburbuja.org/?feed=rss2')]
    keep_only_tags    = [dict(name='div', attrs={'id':'main'})]
#    keep_only_tags    = [dict(attrs={'class':['entry-header','entry-content','comments-title','comment-content','reply']})]

# Let's see what we are downloading... 
    def print_version(self, url): 
      #We don't search for any print version... the only purpose is printing debug information.
      print "print_version:", url
      soupinicial = self.index_to_soup(url)
      a= soupinicial.find('div', attrs={'id':'main'})
      print "------------------------------------------------------------------------------"
      print a
      print "------------------------------------------------------------------------------"
      return url  # return the same parameter we received (do nothing)
Thanks.
julio:map is offline   Reply With Quote