Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-29-2012, 05:27 PM   #1
julio:map
Member
julio:map began at the beginning.
 
Posts: 23
Karma: 12
Join Date: Jul 2011
Device: Cool-er
Pages not showing up

I'm trying to build a recipe for http://www.colectivoburbuja.org/?feed=rss2

Things seemed very easy, because the pages are very "clean"...

All the data that I need is under de DIV ID=main

I explicitly set auto_cleanup to FALSE, and no_stylesheets to TRUE in order to avoid the page not showing up.

I am sure that HTML code is being retrieved (for each article downloaded) because I have made a small "trick" in my recipe to print it (for debuging purposes)...

... but ALL of the pages retrieved are BLANK.

Please can somebody help me understand why?

Code:
class AdvancedUserRecipe1330197191(BasicNewsRecipe):
    title          = u'Colectivo Burbuja'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = False

    no_stylesheets = True

    feeds          = [(u'Colectivo Burbuja', u'http://www.colectivoburbuja.org/?feed=rss2')]
    keep_only_tags    = [dict(name='div', attrs={'id':'main'})]
#    keep_only_tags    = [dict(attrs={'class':['entry-header','entry-content','comments-title','comment-content','reply']})]

# Let's see what we are downloading... 
    def print_version(self, url): 
      #We don't search for any print version... the only purpose is printing debug information.
      print "print_version:", url
      soupinicial = self.index_to_soup(url)
      a= soupinicial.find('div', attrs={'id':'main'})
      print "------------------------------------------------------------------------------"
      print a
      print "------------------------------------------------------------------------------"
      return url  # return the same parameter we received (do nothing)
Thanks.
julio:map is offline   Reply With Quote
Old 03-03-2012, 03:52 PM   #2
julio:map
Member
julio:map began at the beginning.
 
Posts: 23
Karma: 12
Join Date: Jul 2011
Device: Cool-er
SOLVED:

Just using "use_embedded_content = False" solved the problem that was driving me crazy.

I don't know if this is the "intended" funcionality of Calibre.

It understands that the XML file has "enough" content, but searches the pages of the links, processes them, and then discards them leaving them blank.

After doing that the recipe is straightforward.

I hope this helps anybody.

Best regards.
julio:map is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Classic Split PDF pages into smaller pages (images into tiles) Astro Barnes & Noble NOOK 4 06-12-2020 10:56 AM
Troubleshooting Even though showing 3G, but can not show web pages starlighzz Amazon Kindle 10 03-31-2011 11:00 PM
PDF showing as blank pages in PB 301 yskp PocketBook 8 09-27-2010 05:28 PM
New hack PRS-505: multi status line with %read, time&pages reading, pages per minute. Car105 Sony Reader Dev Corner 5 01-03-2010 10:03 AM
Turning Scanned PDFs with facing pages into single pages jimteacher Workshop 5 11-09-2009 02:59 PM


All times are GMT -4. The time now is 03:49 AM.


MobileRead.com is a privately owned, operated and funded community.