Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-01-2018, 07:34 AM   #1
Phoebus
Member
Phoebus began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
Cracked.com - May 2018 update

Cracked.com have changed their code again and broke the feed. This is the code I use:

Code:
from calibre.web.feeds.news import BasicNewsRecipe


class Cracked(BasicNewsRecipe):
    title = u'Cracked.com Weekly download'
    __author__ = 'Update June 2018'
    language = 'en'
    description = "America's Only HumorSite since 1958"
    publisher = 'Cracked'
    category = 'comedy, lists'
    oldest_article =9  # days
    max_articles_per_feed = 100
    no_stylesheets = True
    encoding = 'utf-8'
    remove_javascript = True
    use_embedded_content = False
    recursions = 11
    remove_attributes = ['size', 'style']

    feeds = [(u'Articles', u'http://feeds.feedburner.com/CrackedRSS/')]

    conversion_options = {
        'comment': description, 'tags': category, 'publisher': publisher, 'language': language
    }
   
    keep_only_tags = [  
                    dict(name='div', attrs={'class': [
                                                'content-content',
                                                'contentWrapper',
                                                'content-header',
                                                        ]}),
                    dict(name='article', attrs={'class': [
                                                'module article dropShadowBottomCurved',
                                                'module blog dropShadowBottomCurved',
                                                            ]}),
                      ]

    remove_tags = [
        dict(name='section', attrs={'class': ['socialTools', 'quickFixModule', 'continue-reading']}),
        dict(attrs={'class':['socialShareAfterContent', 'socialShareModule', 'continue-reading', 'social-share-bottom list-inline']}),
        dict(name='div', attrs={'id': ['relatedArticle', 'content-card-top', 'recommendedForYourPleasure', 'navbar']}),
        dict(name='div', attrs={'class': ['comments-wrap', 'container continue-reading', 'row breadcrumbs-wrapper']}),
        dict(name='h4', attrs={'class': ['mobile-ad-label']}),
        dict(name='ul', attrs={'id': [
                                'breadcrumbs',
                                'socialShare',
                                ]}),       
        dict(name='div', attrs={'class': ['bannerAd hidden-sm hidden-md hidden-lg introAd']})
    ]

    def is_link_wanted(self, url, a):
        return a['class'] == 'next' and a.findParent('nav', attrs={'class':'PaginationContent'}) is not None

    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs={'data-img':True}):
            img['src'] = img['data-img']
        for img in soup.findAll('img', attrs={'data-original':True}):
            img['src'] = img['data-original']                              
        for img in soup.findAll('img', attrs={'data-src':True}):
            img['src'] = img['data-src'] 
        return soup
    
    def postprocess_html(self, soup, first_fetch):
        for div in soup.findAll(attrs={'class':'PaginationContent'}):
            div.extract()
        if not first_fetch:
            for div in soup.findAll(attrs={'class':'meta'}):
                div.extract()
 
        return soup
Phoebus is offline   Reply With Quote
Old 06-01-2018, 10:50 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 33,612
Karma: 10209576
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
thanks, updated.
kovidgoyal is offline   Reply With Quote
Old 06-29-2018, 08:31 AM   #3
Phoebus
Member
Phoebus began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Aug 2015
Device: Kobo Aura H2O
Updated slightly following a site update that added unnecessary social share links and recommendations.

Code:
from calibre.web.feeds.news import BasicNewsRecipe


class Cracked(BasicNewsRecipe):
    title = u'Cracked.com Weekly download'
    __author__ = 'Update June 2018'
    language = 'en'
    description = "America's Only HumorSite since 1958"
    publisher = 'Cracked'
    category = 'comedy, lists'
    oldest_article =15  # days
    max_articles_per_feed = 100
    no_stylesheets = True
    encoding = 'utf-8'
    remove_javascript = True
    use_embedded_content = False
    recursions = 11
    remove_attributes = ['size', 'style']

    feeds = [(u'Articles', u'http://feeds.feedburner.com/CrackedRSS/')]

    conversion_options = {
        'comment': description, 'tags': category, 'publisher': publisher, 'language': language
    }
   
    keep_only_tags = [  
                    dict(name='div', attrs={'class': [
                                                'content-content',
                                                'contentWrapper',
                                                'content-header',
                                                        ]}),
                    dict(name='article', attrs={'class': [
                                                'module article dropShadowBottomCurved',
                                                'module blog dropShadowBottomCurved',
                                                            ]}),
                      ]

    remove_tags = [
        dict(name='section', attrs={'class': ['socialTools', 'quickFixModule', 'continue-reading']}),
        dict(attrs={'class':['socialShareAfterContent', 'socialShareModule', 'continue-reading', 'social-share-bottom list-inline']}),
        dict(name='div', attrs={'id': ['relatedArticle', 'content-card-top', 'recommendedForYourPleasure', 'navbar', 'flashbackModuleWrap', 'moreRecommendedArticles']}),
        dict(name='div', attrs={'class': ['comments-wrap', 'container continue-reading', 'row breadcrumbs-wrapper', 'btn-social-favorites col', 'hidden-social col', 'ajax-loader comments-loader-bottom', 'flashback-module-new', 'card-md-list card-sm-list card-xs-list', 'popular-module card-md-list card-sm-list card-xs-list', 'col-md-12 list-title', 'content-cards d-flex flex-wrap', 'google-plus btn btn-social', 'twitter btn btn-socia', 'facebook btn btn-social', 'row social-share-top-wrapper']}),
        dict(name='h4', attrs={'class': ['mobile-ad-label']}),
        dict(name='ul', attrs={'id': [
                                'breadcrumbs',
                                'socialShare',
                                ]}),       
        dict(name='ul', attrs={'class': ['list-unstyled offcanvas-sections']}),
        dict(name='div', attrs={'class': ['bannerAd hidden-sm hidden-md hidden-lg introAd']})
    ]

    def is_link_wanted(self, url, a):
        return a['class'] == 'next' and a.findParent('nav', attrs={'class':'PaginationContent'}) is not None

    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs={'data-img':True}):
            img['src'] = img['data-img']
        for img in soup.findAll('img', attrs={'data-original':True}):
            img['src'] = img['data-original']                              
        for img in soup.findAll('img', attrs={'data-src':True}):
            img['src'] = img['data-src'] 
        return soup
    
    def postprocess_html(self, soup, first_fetch):
        for div in soup.findAll(attrs={'class':'PaginationContent'}):
            div.extract()
        if not first_fetch:
            for div in soup.findAll(attrs={'class':'meta'}):
                div.extract()
 
        return soup
Phoebus is offline   Reply With Quote
Old 06-29-2018, 08:43 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 33,612
Karma: 10209576
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
thanks, updated
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Collected Works Joyce, James: Complete Works | v.12.0 | Update 8 Apr 2018 pynch ePub Books 117 04-28-2018 01:43 PM
FTI Max2 update available, 2018-03-19_13_47_1.8.3_9269185/1043:user/release-keys everalm Onyx Boox 7 04-06-2018 04:33 PM
Cracked.com Calia Recipes 0 08-29-2014 12:48 AM
Screen Cracked omro Astak EZReader 13 05-07-2010 12:39 PM
DH cracked my K2 lala Amazon Kindle 6 02-22-2010 05:43 PM


All times are GMT -4. The time now is 09:20 AM.


MobileRead.com is a privately owned, operated and funded community.