Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-28-2011, 07:07 AM   #1
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
The Sun UK Recipe - Google reader NOT required

The Sun newspaper.
Uses feed43 so is a straight recipe - no google required

Spoiler:


Code:
import urllib, re
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre import __appname__
from calibre.utils.magick import Image, PixelWand
class AdvancedUserRecipe1325006965(BasicNewsRecipe):

    title          = u'The Sun UK'
    cover_url = 'http://www.thesun.co.uk/img/global/new-masthead-logo.png'
     
    description = 'A Recipe for The Sun tabloid UK'
    __author__ = 'Dave Asbury'
    # last updated 28/12/11
    language = 'en_GB'
    oldest_article = 1
    max_articles_per_feed = 10
    remove_empty_feeds = True
    no_stylesheets = True
    #auto_cleanup = True
    #articles_are_obfuscated = True

    masthead_url = 'http://www.thesun.co.uk/sol/img/global/Sun-logo.gif'
    #encoding = 'iso-8859-1'
    
    encoding = 'cp1252'
    remove_empty_feeds = True
    remove_javascript     = True
    no_stylesheets = True
    
    extra_css  = '''
	body{ text-align: justify; font-family:Arial,Helvetica,sans-serif; font-size:11px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:normal;}
                	 '''
    
    preprocess_regexps = [
    	(re.compile(r'<div class="foot-copyright".*?</div>', re.IGNORECASE | re.DOTALL), lambda match: '')]
    
      
   
    keep_only_tags = [
                               dict(name='h1'),dict(name='h2',attrs={'class' : 'medium centered'}),
	           dict(name='div',attrs={'class' : 'text-center'}),
	           dict(name='div',attrs={'id' : 'bodyText'})
	           # dict(name='p')
	           ]
    remove_tags=[
	       #dict(name='head'),
	       dict(attrs={'class' : ['mystery-meat-link','ltbx-container','ltbx-var ltbx-hbxpn','ltbx-var ltbx-nav-loop','ltbx-var ltbx-url']}),
                           dict(name='div',attrs={'class' : 'cf'}),
	       dict(attrs={'title' : 'download flash'}),
                           dict(attrs={'style' : 'padding: 5px'})
	      
	       ]

	
    feeds          = [
	(u'News', u'http://feed43.com/8203386003128155.xml'),
	(u'Politics',u'http://feed43.com/8778630438181348.xml'),
	(u'Sport', u'http://feed43.com/6485060800200516.xml'),
	(u'Bizarre', u'http://feed43.com/0474513283738222.xml'),
	(u'Film',u'http://feed43.com/3205080555347541.xml'),
        	(u'Music',u'http://feed43.com/1280183636043584.xml'),
	(u'Dear Deidre',u'http://feed43.com/0322057870763024.xml'),
]
    def postprocess_html(self, soup, first):
        #process all the images
        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
            iurl = tag['src']
            img = Image()
            img.open(iurl)
            if img < 0:
                raise RuntimeError('Out of memory')
            img.type = "GrayscaleType"
            img.save(iurl)
        return soup

Last edited by scissors; 12-30-2011 at 12:49 PM. Reason: Description said google required. it's not
scissors is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe request for Google Reader headoverheelsjc Recipes 1 06-26-2011 08:25 AM
Error while using Google Reader Recipe cobraa Recipes 2 04-18-2011 09:44 PM
Google Apps for Domains - Google Reader recipe tweak bdillahu Recipes 0 04-13-2011 10:47 PM
Google Reader recipe not working :( techie_007 Calibre 1 01-26-2010 09:58 PM
Recipe Google Reader vs Google Reader Uber DoctorOhh Calibre 0 01-26-2010 04:37 AM


All times are GMT -4. The time now is 02:20 AM.


MobileRead.com is a privately owned, operated and funded community.