Quote:
Originally Posted by UnWeave
Is there a way to alter the engadget recipe such that it downloads the entire article?
|
Here is a new recipe that gets the entire article. The previous recipe just pulled the RSS feed summary and didn't retrieve articles. For most articles, that was enough, but scattered here and there were longer articles with the feed including only the first paragraph.
Kovid - This can probably replace the existing Engadget, but I've named it Engadget_Full
Spoiler:
Code:
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = 'Copyright 2011 Starson17'
'''
engadget.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Engadget(BasicNewsRecipe):
title = u'Engadget_Full'
__author__ = 'Starson17'
__version__ = 'v1.00'
__date__ = '02, July 2011'
description = 'Tech news'
language = 'en'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
remove_javascript = True
remove_empty_feeds = True
keep_only_tags = [dict(name='div', attrs={'class':['post_content permalink ','post_content permalink alt-post-full']})]
remove_tags = [dict(name='div', attrs={'class':['filed_under','post_footer']})]
remove_tags_after = [dict(name='div', attrs={'class':['post_footer']})]
feeds = [(u'Posts', u'http://www.engadget.com/rss.xml')]
extra_css = '''
h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
'''