Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-07-2016, 03:22 PM   #1
epubli
Member
epubli began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Nov 2012
Device: Pocketbook611
Modified engagdet recipe

Hi,

the Calibre-built-in recipe for engadget.com is somewhat broken since some months, the fetched articles are often incomplete.

I have modified the recipe, so it downloads the full text articles again. The recipe is not perfect, though, since I could not figure out how to download the image in the article header reliably. Now some news articles have a title image, some not.

Anyway, here comes the recipe I am currently using and which is working quite ok for me. Enjoy !

Spoiler:
Code:
#!/usr/bin/env  python2

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2011 Starson17'
'''
engadget.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Engadget(BasicNewsRecipe):
    title                 = u'Engadget'
    __author__            = 'Starson17, modified'
    __version__           = 'v1.00'
    __date__              = '02, January 2016'
    description           = 'Tech news'
    language              = 'en'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_empty_feeds    = True
    compress_news_images = True
    compress_news_images_auto_size = 8
    scale_news_images_to_device = True
    #scale_news_images = (600, 800)
    remove_attributes = ['class']
    #auto_cleanup = True
    #auto_cleanup_keep = '//div[@class="article-text c-gray-1"]|//div[@class="o-title_mark"]'
    #'//h1|//h2|//div[@class="article-text"]'
    keep_only_tags = [
        dict(name='img', attrs={'class':['stretch-img  hide@m-']}),
        dict(name='div', attrs={'class':['article-text c-gray-1','o-title_mark@tp+ bc-gray-1 col-10-of-12@tl+']}),
    ]
    #remove_tags = [dict(name='div', attrs={'class':['filed_under','post_footer']})]
    #remove_tags_after =  [dict(name='div', attrs={'class':['post_footer']})]

    feeds = [(u'Posts', u'http://www.engadget.com/rss.xml')]

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
    '''

Last edited by PeterT; 02-07-2016 at 06:18 PM. Reason: Added in code tags to preserve spacing
epubli is offline   Reply With Quote
Old 03-23-2016, 05:05 PM   #2
epubli
Member
epubli began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Nov 2012
Device: Pocketbook611
Update

Here is an update to the engadget.com recipe. Some articles were not fully downloaded, this is fixed now. Have fun.

Code:
#!/usr/bin/env  python2

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2011 Starson17'
'''
engadget.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Engadget(BasicNewsRecipe):
    title                 = u'Engadget'
    __author__            = 'Starson17, modified by epubli'
    __version__           = 'v1.10'
    __date__              = '23, March 2016'
    description           = 'Tech news'
    language              = 'en'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_empty_feeds    = True
    compress_news_images = True
    compress_news_images_auto_size = 8
    remove_attributes = ['class']
    keep_only_tags = [
        dict(name='img', attrs={'class':['stretch-img  hide@m-']}),
        dict(name='div', attrs={'class':['article-text c-gray-1','article-text c-gray-1 no-review','o-title_mark@tp+ bc-gray-1 col-10-of-12@tl+']}),
    ]

    feeds = [(u'Posts', u'http://www.engadget.com/rss.xml')]

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
    '''

Last edited by kovidgoyal; 03-23-2016 at 10:02 PM.
epubli is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Override 'calibre' as converted feed's author in modified recipe sk365 Recipes 8 09-30-2016 03:11 AM
Trying to make a modified version of the recipe for "The Atlantic" camiller Recipes 3 02-14-2012 03:59 PM
Modified Recipe Tweakers.net - need help roedi06 Recipes 4 01-17-2012 07:42 AM
Modified Reuters News Recipe Submission rogerx Recipes 1 08-25-2011 10:19 PM
Modified Irish Times Recipe phiznlil Recipes 2 04-01-2011 06:27 AM


All times are GMT -4. The time now is 05:59 AM.


MobileRead.com is a privately owned, operated and funded community.