View Single Post
Old 07-02-2011, 08:56 PM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by UnWeave View Post
Is there a way to alter the engadget recipe such that it downloads the entire article?
Here is a new recipe that gets the entire article. The previous recipe just pulled the RSS feed summary and didn't retrieve articles. For most articles, that was enough, but scattered here and there were longer articles with the feed including only the first paragraph.

Kovid - This can probably replace the existing Engadget, but I've named it Engadget_Full
Spoiler:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2011 Starson17'
'''
engadget.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Engadget(BasicNewsRecipe):
    title                 = u'Engadget_Full'
    __author__            = 'Starson17'
    __version__           = 'v1.00'
    __date__              = '02, July 2011'
    description           = 'Tech news'
    language              = 'en'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_empty_feeds    = True

    keep_only_tags = [dict(name='div', attrs={'class':['post_content permalink ','post_content permalink alt-post-full']})]
    remove_tags = [dict(name='div', attrs={'class':['filed_under','post_footer']})]
    remove_tags_after =  [dict(name='div', attrs={'class':['post_footer']})]
    
    feeds = [(u'Posts', u'http://www.engadget.com/rss.xml')]

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
		'''
Starson17 is offline   Reply With Quote