02-29-2012, 05:27 PM | #1 |
Member
Posts: 23
Karma: 12
Join Date: Jul 2011
Device: Cool-er
|
Pages not showing up
I'm trying to build a recipe for http://www.colectivoburbuja.org/?feed=rss2
Things seemed very easy, because the pages are very "clean"... All the data that I need is under de DIV ID=main I explicitly set auto_cleanup to FALSE, and no_stylesheets to TRUE in order to avoid the page not showing up. I am sure that HTML code is being retrieved (for each article downloaded) because I have made a small "trick" in my recipe to print it (for debuging purposes)... ... but ALL of the pages retrieved are BLANK. Please can somebody help me understand why? Code:
class AdvancedUserRecipe1330197191(BasicNewsRecipe): title = u'Colectivo Burbuja' oldest_article = 7 max_articles_per_feed = 100 auto_cleanup = False no_stylesheets = True feeds = [(u'Colectivo Burbuja', u'http://www.colectivoburbuja.org/?feed=rss2')] keep_only_tags = [dict(name='div', attrs={'id':'main'})] # keep_only_tags = [dict(attrs={'class':['entry-header','entry-content','comments-title','comment-content','reply']})] # Let's see what we are downloading... def print_version(self, url): #We don't search for any print version... the only purpose is printing debug information. print "print_version:", url soupinicial = self.index_to_soup(url) a= soupinicial.find('div', attrs={'id':'main'}) print "------------------------------------------------------------------------------" print a print "------------------------------------------------------------------------------" return url # return the same parameter we received (do nothing) |
03-03-2012, 03:52 PM | #2 |
Member
Posts: 23
Karma: 12
Join Date: Jul 2011
Device: Cool-er
|
SOLVED:
Just using "use_embedded_content = False" solved the problem that was driving me crazy. I don't know if this is the "intended" funcionality of Calibre. It understands that the XML file has "enough" content, but searches the pages of the links, processes them, and then discards them leaving them blank. After doing that the recipe is straightforward. I hope this helps anybody. Best regards. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Classic Split PDF pages into smaller pages (images into tiles) | Astro | Barnes & Noble NOOK | 4 | 06-12-2020 10:56 AM |
Troubleshooting Even though showing 3G, but can not show web pages | starlighzz | Amazon Kindle | 10 | 03-31-2011 11:00 PM |
PDF showing as blank pages in PB 301 | yskp | PocketBook | 8 | 09-27-2010 05:28 PM |
New hack PRS-505: multi status line with %read, time&pages reading, pages per minute. | Car105 | Sony Reader Dev Corner | 5 | 01-03-2010 10:03 AM |
Turning Scanned PDFs with facing pages into single pages | jimteacher | Workshop | 5 | 11-09-2009 02:59 PM |