Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-11-2012, 09:27 AM   #1
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
update daily mirror uk

Well no-one seems to read the mirror apart from me :-)

But for anyone else who may be wondering why it's stopped working - they pulled the rss feeds.

anyway - here's a modded recipe using feed43 that works

Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
import re
from calibre.utils.magick import Image, PixelWand
class AdvancedUserRecipe1306061239(BasicNewsRecipe):
    title          = u'The Daily Mirror'
    description = 'News as provide by The Daily Mirror -UK'

    __author__ = 'Dave Asbury'
    # last updated 11/2/12
    language = 'en_GB'

    cover_url = 'http://yookeo.com/screens/m/i/mirror.co.uk.jpg'

    masthead_url = 'http://www.nmauk.co.uk/nma/images/daily_mirror.gif'


    oldest_article = 1
    max_articles_per_feed = 5
    remove_empty_feeds = True
    remove_javascript     = True
    no_stylesheets = True
    auto_cleanup = True
    #conversion_options = { 'linearize_tables' : True }
    
    
    #keep_only_tags = [
      #     dict(name='h1'),
      # dict(name='div',attrs={'id' : 'body-content'}),
       #dict(name='div',atts={'class' : 'article-body'}),
       #dict(attrs={'class' : ['article-attr','byline append-1','published']}),
       #dict(name='p'),
       # ]

    #remove_tags_after = [dict (name='div',attrs={'class' : 'related'})]

    remove_tags = [
           dict(name='title'),
           dict(name='div',attrs={'class' : ['inline-ad span-16 last','caption']}),
          # dict(name='div',attrs={'id' : ['sidebar','menu','search-box','roffers-top']}),
           #dict(name='div',attrs={'class' :['inline-ad span-16 last','article-resize','related','list teasers']}),
           #dict(attrs={'class' : ['channellink','article-tags','replace','append-html']}),
          ]
    
   # preprocess_regexps = [
    #(re.compile(r'<dl class="q-search">.*?</dl>', re.IGNORECASE | re.DOTALL), lambda match: '')]
    preprocess_regexps = [
    (re.compile(r'- mirror.co.uk', re.IGNORECASE | re.DOTALL), lambda match: '')]
    
    preprocess_regexps = [
    (re.compile(r'Advertisement >>', re.IGNORECASE | re.DOTALL), lambda match: '')]
    
    #preprocess_regexps = [
    #(re.compile(r'Sponsored Links', re.IGNORECASE | re.DOTALL), lambda match: '')]
    
    feeds          = [

        (u'UK News', u'http://feed43.com/0287771688643868.xml')
        ,(u'Tech News', u'http://feed43.com/2455520588350501.xml')
        ,(u'Weird World','http://feed43.com/0863800333634654.xml')
        ,(u'Sport','http://feed43.com/7713243036546130.xml')
        ,(u'Sport : Boxing ','http://feed43.com/0414732220804255.xml')
        ,(u'Sport : Rugby Union','http://feed43.com/4710138762362383.xml')
        ,(u'Sport : Other','http://feed43.com/4501416886323415.xml')
        ,(u'TV and Film','http://feed43.com/5238302853765104.xml')
        ,(u'Celebs','http://feed43.com/8770061048844683.xml')        
        ,(u'Life Style : Family','http://feed43.com/4356170742410338.xml')
         ,(u'Travel','http://feed43.com/1436576006476607.xml')



           # example of commented out feed not needed ,(u'Travel','http://www.mirror.co.uk/advice/travel/rss.xml')
  ]
    extra_css  = '''
	body{ text-align: justify; font-family:Arial,Helvetica,sans-serif; font-size:11px; font-size-adjust:none; font-stretch:normal; font-style:normal; font-variant:normal; font-weight:normal;}
                    h1{ font-size:18px;}
                    img { display:block}
                	 '''
    def postprocess_html(self, soup, first):
        #process all the images
        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
            iurl = tag['src']
            img = Image()
            img.open(iurl)
            if img < 0:
                raise RuntimeError('Out of memory')
            img.type = "GrayscaleType"
            img.save(iurl)
        return soup

Last edited by scissors; 02-11-2012 at 10:07 AM.
scissors is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Free (Kindle/Nook/Christianbook) Daily Light on the Daily Path [Devotional] ATDrake Deals and Resources (No Self-Promotion or Affiliate Links) 4 04-20-2012 02:48 PM
Improved Daily Mirror UK recipe scissors Recipes 2 12-27-2011 03:14 AM
UK Daily Mirror Recipe. scissors Recipes 0 06-11-2011 04:44 AM
Daily update June 24 slow? PKFFW OpenInkpot 0 06-27-2010 02:02 AM


All times are GMT -4. The time now is 11:05 PM.


MobileRead.com is a privately owned, operated and funded community.