View Single Post
Old 07-27-2013, 08:36 PM   #1
Camper65
Enthusiast
Camper65 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
Problem getting print_version to be pulled

I'm working on fixing my InformationWeek recipe. It gets the regular page articles (and if more than one page, only the first page). I had it set to actually try to pull the print version (which is the full article) but it's still not getting the print version.

Here is the recipe

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.web.feeds import Feed

class InformationWeek(BasicNewsRecipe):
    title          = u'InformationWeek'
    oldest_article = 3
    max_articles_per_feed = 150
    auto_cleanup = True
    ignore_duplicate_articles = {'title', 'url'}
    remove_empty_feeds = True
    remove_javascript = True
    use_embedded_content   = False


    feeds          = [
                          (u'InformationWeek - Stories', u'http://www.informationweek.com/rss/pheedo/all_story_blog.xml?cid=RSSfeed_IWK_ALL'),
                          (u'InformationWeek - News', u'http://www.informationweek.com/rss/pheedo/news.xml?cid=RSSfeed_IWK_News'),
                          (u'InformationWeek - Personal Tech', u'http://www.informationweek.com/rss/pheedo/personaltech.xml?cid=RSSfeed_IWK_Personal_Tech'),
                          (u'InformationWeek - Software', u'http://www.informationweek.com/rss/pheedo/software.xml?cid=RSSfeed_IWK_Software'),
	      (u'InforamtionWeek - Hardware', u'http://www.informationweek.com/rss/pheedo/hardware.xml?cid=RSSfeed_IWK_Hardware')
                     ]

    def parse_feeds (self): 
      feeds = BasicNewsRecipe.parse_feeds(self) 
      for feed in feeds:
        for article in feed.articles[:]:
          print 'article.title is: ', article.title
          if 'healthcare' in article.title or 'healthcare' in article.url:
            feed.articles.remove(article)
      return feeds

    def print_version(self, url):
          main, sep, unneeded = url.rpartition('?')
          return main + '?printer_friendly=this-page'
Here is one of the original article URLs

http://www.informationweek.com/socia...SSfeed_IWK_ALL

and here is the printer version URL
http://www.informationweek.com/socia...ndly=this-page

I presently have the recipe remove the last bit (which changes based on which area it comes from) and put in ?printer_friendly=this-page but it's still failing to download the printer version of the article.

Any ideas?
Camper65 is offline   Reply With Quote