Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-07-2016, 03:22 PM   #1
epubli
Enthusiast
epubli began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Nov 2012
Device: Pocketbook Inkpad 3
Modified engagdet recipe

Hi,

the Calibre-built-in recipe for engadget.com is somewhat broken since some months, the fetched articles are often incomplete.

I have modified the recipe, so it downloads the full text articles again. The recipe is not perfect, though, since I could not figure out how to download the image in the article header reliably. Now some news articles have a title image, some not.

Anyway, here comes the recipe I am currently using and which is working quite ok for me. Enjoy !

Spoiler:
Code:
#!/usr/bin/env  python2

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2011 Starson17'
'''
engadget.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Engadget(BasicNewsRecipe):
    title                 = u'Engadget'
    __author__            = 'Starson17, modified'
    __version__           = 'v1.00'
    __date__              = '02, January 2016'
    description           = 'Tech news'
    language              = 'en'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_empty_feeds    = True
    compress_news_images = True
    compress_news_images_auto_size = 8
    scale_news_images_to_device = True
    #scale_news_images = (600, 800)
    remove_attributes = ['class']
    #auto_cleanup = True
    #auto_cleanup_keep = '//div[@class="article-text c-gray-1"]|//div[@class="o-title_mark"]'
    #'//h1|//h2|//div[@class="article-text"]'
    keep_only_tags = [
        dict(name='img', attrs={'class':['stretch-img  hide@m-']}),
        dict(name='div', attrs={'class':['article-text c-gray-1','o-title_mark@tp+ bc-gray-1 col-10-of-12@tl+']}),
    ]
    #remove_tags = [dict(name='div', attrs={'class':['filed_under','post_footer']})]
    #remove_tags_after =  [dict(name='div', attrs={'class':['post_footer']})]

    feeds = [(u'Posts', u'http://www.engadget.com/rss.xml')]

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
    '''

Last edited by PeterT; 02-07-2016 at 06:18 PM. Reason: Added in code tags to preserve spacing
epubli is offline   Reply With Quote
Old 03-23-2016, 05:05 PM   #2
epubli
Enthusiast
epubli began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Nov 2012
Device: Pocketbook Inkpad 3
Update

Here is an update to the engadget.com recipe. Some articles were not fully downloaded, this is fixed now. Have fun.

Code:
#!/usr/bin/env  python2

__license__   = 'GPL v3'
__copyright__ = 'Copyright 2011 Starson17'
'''
engadget.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Engadget(BasicNewsRecipe):
    title                 = u'Engadget'
    __author__            = 'Starson17, modified by epubli'
    __version__           = 'v1.10'
    __date__              = '23, March 2016'
    description           = 'Tech news'
    language              = 'en'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_empty_feeds    = True
    compress_news_images = True
    compress_news_images_auto_size = 8
    remove_attributes = ['class']
    keep_only_tags = [
        dict(name='img', attrs={'class':['stretch-img  hide@m-']}),
        dict(name='div', attrs={'class':['article-text c-gray-1','article-text c-gray-1 no-review','o-title_mark@tp+ bc-gray-1 col-10-of-12@tl+']}),
    ]

    feeds = [(u'Posts', u'http://www.engadget.com/rss.xml')]

    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
    '''

Last edited by kovidgoyal; 03-23-2016 at 10:02 PM.
epubli is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Override 'calibre' as converted feed's author in modified recipe sk365 Recipes 11 01-21-2025 07:07 AM
Trying to make a modified version of the recipe for "The Atlantic" camiller Recipes 3 02-14-2012 03:59 PM
Modified Recipe Tweakers.net - need help roedi06 Recipes 4 01-17-2012 07:42 AM
Modified Reuters News Recipe Submission rogerx Recipes 1 08-25-2011 10:19 PM
Modified Irish Times Recipe phiznlil Recipes 2 04-01-2011 06:27 AM


All times are GMT -4. The time now is 01:53 AM.


MobileRead.com is a privately owned, operated and funded community.