I'm working on fixing my
InformationWeek recipe. It gets the regular page articles (and if more than one page, only the first page). I had it set to actually try to pull the print version (which is the full article) but it's still not getting the print version.
Here is the recipe
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.web.feeds import Feed
class InformationWeek(BasicNewsRecipe):
title = u'InformationWeek'
oldest_article = 3
max_articles_per_feed = 150
auto_cleanup = True
ignore_duplicate_articles = {'title', 'url'}
remove_empty_feeds = True
remove_javascript = True
use_embedded_content = False
feeds = [
(u'InformationWeek - Stories', u'http://www.informationweek.com/rss/pheedo/all_story_blog.xml?cid=RSSfeed_IWK_ALL'),
(u'InformationWeek - News', u'http://www.informationweek.com/rss/pheedo/news.xml?cid=RSSfeed_IWK_News'),
(u'InformationWeek - Personal Tech', u'http://www.informationweek.com/rss/pheedo/personaltech.xml?cid=RSSfeed_IWK_Personal_Tech'),
(u'InformationWeek - Software', u'http://www.informationweek.com/rss/pheedo/software.xml?cid=RSSfeed_IWK_Software'),
(u'InforamtionWeek - Hardware', u'http://www.informationweek.com/rss/pheedo/hardware.xml?cid=RSSfeed_IWK_Hardware')
]
def parse_feeds (self):
feeds = BasicNewsRecipe.parse_feeds(self)
for feed in feeds:
for article in feed.articles[:]:
print 'article.title is: ', article.title
if 'healthcare' in article.title or 'healthcare' in article.url:
feed.articles.remove(article)
return feeds
def print_version(self, url):
main, sep, unneeded = url.rpartition('?')
return main + '?printer_friendly=this-page'
Here is one of the original article URLs
http://www.informationweek.com/socia...SSfeed_IWK_ALL
and here is the printer version URL
http://www.informationweek.com/socia...ndly=this-page
I presently have the recipe remove the last bit (which changes based on which area it comes from) and put in ?printer_friendly=this-page but it's still failing to download the printer version of the article.
Any ideas?