Hi,
the Calibre-built-in recipe for engadget.com is somewhat broken since some months, the fetched articles are often incomplete.
I have modified the recipe, so it downloads the full text articles again. The recipe is not perfect, though, since I could not figure out how to download the image in the article header reliably. Now some news articles have a title image, some not.
Anyway, here comes the recipe I am currently using and which is working quite ok for me. Enjoy !
Spoiler:
Code:
#!/usr/bin/env python2
__license__ = 'GPL v3'
__copyright__ = 'Copyright 2011 Starson17'
'''
engadget.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Engadget(BasicNewsRecipe):
title = u'Engadget'
__author__ = 'Starson17, modified'
__version__ = 'v1.00'
__date__ = '02, January 2016'
description = 'Tech news'
language = 'en'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
remove_javascript = True
remove_empty_feeds = True
compress_news_images = True
compress_news_images_auto_size = 8
scale_news_images_to_device = True
#scale_news_images = (600, 800)
remove_attributes = ['class']
#auto_cleanup = True
#auto_cleanup_keep = '//div[@class="article-text c-gray-1"]|//div[@class="o-title_mark"]'
#'//h1|//h2|//div[@class="article-text"]'
keep_only_tags = [
dict(name='img', attrs={'class':['stretch-img hide@m-']}),
dict(name='div', attrs={'class':['article-text c-gray-1','o-title_mark@tp+ bc-gray-1 col-10-of-12@tl+']}),
]
#remove_tags = [dict(name='div', attrs={'class':['filed_under','post_footer']})]
#remove_tags_after = [dict(name='div', attrs={'class':['post_footer']})]
feeds = [(u'Posts', u'http://www.engadget.com/rss.xml')]
extra_css = '''
h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
h2{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:small;}
p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
'''