Pages not showing up

julio:map · 02-29-2012, 05:27 PM

I'm trying to build a recipe for http://www.colectivoburbuja.org/?feed=rss2

Things seemed very easy, because the pages are very "clean"...

All the data that I need is under de DIV ID=main

I explicitly set auto_cleanup to FALSE, and no_stylesheets to TRUE in order to avoid the page not showing up.

I am sure that HTML code is being retrieved (for each article downloaded) because I have made a small "trick" in my recipe to print it (for debuging purposes)...

... but ALL of the pages retrieved are BLANK.

Please can somebody help me understand why?

Code:

class AdvancedUserRecipe1330197191(BasicNewsRecipe):
    title          = u'Colectivo Burbuja'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = False

    no_stylesheets = True

    feeds          = [(u'Colectivo Burbuja', u'http://www.colectivoburbuja.org/?feed=rss2')]
    keep_only_tags    = [dict(name='div', attrs={'id':'main'})]
#    keep_only_tags    = [dict(attrs={'class':['entry-header','entry-content','comments-title','comment-content','reply']})]

# Let's see what we are downloading... 
    def print_version(self, url): 
      #We don't search for any print version... the only purpose is printing debug information.
      print "print_version:", url
      soupinicial = self.index_to_soup(url)
      a= soupinicial.find('div', attrs={'id':'main'})
      print "------------------------------------------------------------------------------"
      print a
      print "------------------------------------------------------------------------------"
      return url  # return the same parameter we received (do nothing)

Thanks.

julio:map · 03-03-2012, 03:52 PM

SOLVED:

Just using "use_embedded_content = False" solved the problem that was driving me crazy.

I don't know if this is the "intended" funcionality of Calibre.

It understands that the XML file has "enough" content, but searches the pages of the links, processes them, and then discards them leaving them blank.

After doing that the recipe is straightforward.

I hope this helps anybody.

Best regards.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Classic Split PDF pages into smaller pages (images into tiles)	Astro	Barnes & Noble NOOK	4	06-12-2020 10:56 AM
Troubleshooting Even though showing 3G, but can not show web pages	starlighzz	Amazon Kindle	10	03-31-2011 11:00 PM
PDF showing as blank pages in PB 301	yskp	PocketBook	8	09-27-2010 05:28 PM
New hack PRS-505: multi status line with %read, time&pages reading, pages per minute.	Car105	Sony Reader Dev Corner	5	01-03-2010 10:03 AM
Turning Scanned PDFs with facing pages into single pages	jimteacher	Workshop	5	11-09-2009 02:59 PM

03-03-2012, 03:52 PM	#2
julio:map Member Posts: 23 Karma: 12 Join Date: Jul 2011 Device: Cool-er	SOLVED: Just using "use_embedded_content = False" solved the problem that was driving me crazy. I don't know if this is the "intended" funcionality of Calibre. It understands that the XML file has "enough" content, but searches the pages of the links, processes them, and then discards them leaving them blank. After doing that the recipe is straightforward. I hope this helps anybody. Best regards.