Thread: Cosmopolitan UK
View Single Post
Old 12-21-2011, 12:38 PM   #2
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
slight improvement

Removed annoying thumbnails after love it or loathe it
Spoiler:

Code:
import re
from calibre.web.feeds.news import BasicNewsRecipe
#from calibre import __appname__
from calibre.utils.magick import Image
class AdvancedUserRecipe1306097511(BasicNewsRecipe):
    title          = u'Cosmopolitan UK'
    description = 'Fashion, beauty and Gossip for women from COSMOPOLITAN -UK'

    __author__ = 'Dave Asbury'
    #last update 21/12/11
    # greyscale code by Starson
    cover_url = 'http://www.cosmopolitan.magazine.co.uk/files/4613/2085/8988/Cosmo_Cover3.jpg'
    no_stylesheets = True
    recursion = 20
    oldest_article = 7
    max_articles_per_feed = 20
    remove_empty_feeds = True
    remove_javascript     = True

    preprocess_regexps = [
    (re.compile(r'<!-- Begin tmpl module_competition_offer -->.*?<!-- End tmpl module_competition_offer-->', re.IGNORECASE | re.DOTALL), lambda match: '')]
    language = 'en_GB'


    masthead_url        = 'http://www.cosmopolitan.co.uk/cm/cosmopolitanuk/site_images/header/cosmouk_logo_home.gif'


    keep_only_tags = [
                              dict(attrs={'class' : ['dateAuthor', 'publishDate']}),
                              dict(name='div',attrs ={'id' : ['main_content']})
                              ]
    remove_tags    = [
                              dict(name='div',attrs={'class' : ['blogInfo','viral_toolbar','comment_number','prevEntry nav']}),
                              dict(name='div',attrs={'class' : 'blog_module_about_the_authors'}),
                              dict(attrs={'id': ['breadcrumbs','comment','related_links_list','right_rail','content_sec_fb_more','content_sec_mostpopularstories','content-sec_fb_frame_viewfb_bot']}),
                              dict(attrs={'class' : ['read_liked_that_header','fb_back_next_area']}),
                              dict(name='li',attrs={'class' : 'thumb'})
	          ]

    feeds          = [
        (u'Love & Sex', u'http://www.cosmopolitan.co.uk/love-sex/rss/'), (u'Men', u'http://cosmopolitan.co.uk/men/rss/'), (u'Fashion', u'http://cosmopolitan.co.uk/fashion/rss/'), (u'Hair & Beauty', u'http://cosmopolitan.co.uk/beauty-hair/rss/'), (u'LifeStyle', u'http://cosmopolitan.co.uk/lifestyle/rss/'), (u'Cosmo On Campus', u'http://cosmopolitan.co.uk/campus/rss/'), (u'Celebrity Gossip', u'http://cosmopolitan.co.uk/celebrity-gossip/rss/')]

    def postprocess_html(self, soup, first):
        #process all the images
        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
            iurl = tag['src']
            img = Image()
            img.open(iurl)
            if img < 0:
                raise RuntimeError('Out of memory')
            img.type = "GrayscaleType"
            img.save(iurl)
        return soup
scissors is offline   Reply With Quote