![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
The Sun UK
A Recipe for The Sun tabloid UK using the google reader recipe.
The reason it uses google reader is because the feeds keep dissappearing using the direct method. (I think the site monitors access?) Anyway - I set up a gmail account user called sunreader solely for the reader. I then subscribed to the suns RSS feeds at http://www.thesun.co.uk/sol/homepage...icle247949.ece examples are News http://www.thesun.co.uk/sol/homepage...icle312900.ece Sport http://www.thesun.co.uk/sol/homepage...icle247732.ece ShowBiz http://www.thesun.co.uk/sol/homepage...cle1999685.ece Bizarre http://www.thesun.co.uk/sol/homepage...icle247767.ece Then in the google reader for each feed subscribed to click feed settings and select new folder - the name you enter here is the name that will appear in the TOC Code:
import urllib, re, mechanize from calibre.web.feeds.recipes import BasicNewsRecipe from calibre import __appname__ from calibre.utils.magick import Image, PixelWand class GoogleReader(BasicNewsRecipe): title = 'The Sun UK Via Google Reader' #last updated 2/11/11 images to greyscale - by Starson17 cover_url = 'http://www.thesun.co.uk/img/global/new-masthead-logo.png' description = 'A Recipe for The Sun tabloid UK using the google reader recipe. You need to set up a gmail account solely for the reader, then subscribe to the suns RSS feeds at http://www.thesun.co.uk/sol/homepage/hygiene/rss_sign_up/article247949.ece' needs_subscription = True __author__ = ' Dave Asbury, davec, rollercoaster, Starson17' base_url = 'http://www.google.com/reader/atom/' oldest_article = 1 max_articles_per_feed = 20 get_options = '?n=%d&xt=user/-/state/com.google/read' % max_articles_per_feed # use_embedded_content = True masthead_url = 'http://www.thesun.co.uk/sol/img/global/Sun-logo.gif' #encoding = 'iso-8859-1' encoding = 'cp1252' remove_empty_feeds = True remove_javascript = True no_stylesheets = True extra_css = ''' body{ text-align: justify; font-family:Arial,Helvetica,sans-serif; font-size:11px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:normal;} ''' preprocess_regexps = [ (re.compile(r'<div class="foot-copyright".*?</div>', re.IGNORECASE | re.DOTALL), lambda match: '')] keep_only_tags = [ dict(name='h1'),dict(name='h2',attrs={'class' : 'medium centered'}), dict(name='div',attrs={'class' : 'text-center'}), dict(name='div',attrs={'id' : 'bodyText'}) # dict(name='p') ] remove_tags=[ #dict(name='head'), dict(attrs={'class' : ['mystery-meat-link','ltbx-container','ltbx-var ltbx-hbxpn','ltbx-var ltbx-nav-loop','ltbx-var ltbx-url']}), dict(name='div',attrs={'class' : 'cf'}), dict(attrs={'title' : 'download flash'}), dict(attrs={'style' : 'padding: 5px'}) ] def get_browser(self): br = BasicNewsRecipe.get_browser(self) if self.username is not None and self.password is not None: request = urllib.urlencode([('Email', self.username), ('Passwd', self.password), ('service', 'reader'), ('accountType', 'HOSTED_OR_GOOGLE'), ('source', __appname__)]) response = br.open('https://www.google.com/accounts/ClientLogin', request) auth = re.search('Auth=(\S*)', response.read()).group(1) cookies = mechanize.CookieJar() br = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies)) br.addheaders = [('Authorization', 'GoogleLogin auth='+auth)] return br def get_feeds(self): feeds = [] soup = self.index_to_soup('http://www.google.com/reader/api/0/tag/list') for id in soup.findAll(True, attrs={'name':['id']}): url = id.contents[0] feeds.append((re.search('/([^/]*)$', url).group(1), self.base_url + urllib.quote(url.encode('utf-8')) + self.get_options)) return feeds def print_soup(self, soup): print(soup) def postprocess_html(self, soup, first): #process all the images for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')): iurl = tag['src'] img = Image() img.open(iurl) if img < 0: raise RuntimeError('Out of memory') img.type = "GrayscaleType" img.save(iurl) return soup #auto_cleanup = True Last edited by scissors; 11-02-2011 at 03:52 PM. Reason: images to greyscale - by Starson17 |
![]() |
![]() |
![]() |
#2 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Improved Sun UK recipe
occasional garbage appearing in nav bar fixed. Last edited 29/10/11.
Enjoy - without paying Mr Murdoch any money ![]() |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
30/10/11
"click here for slideshow" button removed from sports pages and some other stray <class> that appeared in the articles as normal text cover url changed to logo Last edited by scissors; 10-30-2011 at 04:13 AM. |
![]() |
![]() |
![]() |
#4 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Do you get the pictures?
(In case anyone from outside the UK isn't aware, the Sun is best known for its pictures of topless girls ![]() |
![]() |
![]() |
![]() |
#5 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
Quote:
Actually, i'd say they're more famous for phone hacking these days (oh - that was notw wasn't it - of course the 2 are nothing to do with each other... ![]() |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
thanks to the brilliant starson 17 - images now converted to grayscale. This turned a 5.9m epub into a 4.7m epub.
|
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,185
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#8 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
|
|
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,185
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problems with the sun | mokel22 | enTourage eDGe | 2 | 07-10-2011 04:25 PM |
Baltimore sun help? | copyrite | Recipes | 2 | 10-31-2010 03:59 PM |
PRS-900 Fading in the sun | vxf | Sony Reader | 15 | 08-21-2010 11:36 PM |
PRS-900 Baltimore Sun? | luqmaninbmore | Sony Reader | 1 | 02-10-2010 05:28 PM |
Sun Fading | SanAntone | Amazon Kindle | 23 | 07-08-2009 06:36 PM |