Thread: Daily Express
View Single Post
Old 06-27-2015, 11:52 AM   #1
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
Daily Express

daily express (uk) RSS feeds have not altered since last monday.

quick feed43 fix for the recipe until(if) they come back

Spoiler:
Code:
import re

from calibre.web.feeds.news import BasicNewsRecipe
from calibre import browser
class AdvancedUserRecipe1390132023(BasicNewsRecipe):
    title          = u'Daily Express'
    __author__ = 'Dave Asbury'
   # 27.6.15 using feed43 as rss feeds dead
   # feed 43 string = <div {*}<a href="{%}"{*}<h4>{%}</h4>
    oldest_article = 1.5
    max_articles_per_feed = 10
    compress_news_images = True
    compress_news_images_max_size = 20
    ignore_duplicate_articles = {'title', 'url'}
    masthead_url = 'http://cdn.images.dailyexpress.co.uk/img/page/express_logo.png'
    auto_cleanup_keep = '//*[@class="author"]|//section[@class="photo changeSpace"]'
    auto_cleanup = True
    no_stylesheets        = False
    
    preprocess_regexps = [
		 (re.compile(r'\| [\w].+?\| [\w].+?\| Daily Express', re.IGNORECASE | re.DOTALL), lambda match: ''),
         	
         		]
    feeds          = [

		(u'UK News', u'http://feed43.com/3460616116055543.xml'),
        #http://feeds.feedburner.com/daily-express-uk-news'),#http://www.express.co.uk/posts/rss/1/uk'),
        (u'World News',u'http://feed43.com/5650105317448722.xml'),
        #http://www.express.co.uk/posts/rss/78/world'),
        (u'Showbiz News',u'http://feed43.com/2564008080442425.xml'),
        (u'Finance',u'http://feed43.com/8636615325246501.xml'),
        #http://www.express.co.uk/posts/rss/21/finance'),
        (u'Sport - Boxing',u'http://feed43.com/7570233481503246.xml'),
        (u'Sport - Rugby Union',u'http://feed43.com/4235483647118470.xml'),
        (u'Sport - Others',u'http://feed43.com/6106345668326737.xml'),
        #http://www.express.co.uk/posts/rss/65/sport'),
        (u'Entertainment',u'http://feed43.com/8864645080210731.xml'),
        #http://www.express.co.uk/posts/rss/18/entertainment'),
        (u'Lifestyle',u'http://feed43.com/8705161426770855.xml'),
        #http://www.express.co.uk/posts/rss/8/life&style'),
        (u'Travel',u'http://feed43.com/6547373884767554.xml'),
                        ]
    
    def get_cover_url(self):
        print '============Cover ================='
        print
        soup = self.index_to_soup('http://www.express.co.uk/ourpaper/')
        cov = soup.find(attrs={'src' : re.compile('http://cdn.images.express.co.uk/img/covers/')})
        cov=str(cov)
        print '^^^^^^^', cov
        cov2 =  re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', cov)

        cov=str(cov2)
        cov=cov[2:len(cov)-2]

        print '&&&&&&&&',cov,'***'
        #cover_url=cov
        br = browser()
        br.set_handle_redirect(False)
        try:
            br.open_novisit(cov)
            cover_url = cov
        except:
            cover_url ='http://cdn.images.express.co.uk/img/static/ourpaper/header-back-issue-papers.jpg'

        return cover_url


    extra_css = '''
                    #h1{font-weight:bold;font-size:175%;}
                    h2{display: block;margin-left: auto;margin-right: auto;width:100%;font-weight:bold;font-size:175%;}
                    #p{font-size:14px;}
                    #body{font-size:14px;}
                    .newsCaption {display: block;margin-left: auto;margin-right: auto;width:100%;font-size:40%;}
                    .publish-info {font-size:50%;}
                    .photo img {display: block;margin-left: auto;margin-right: auto;width:100%;}
      '''
scissors is offline   Reply With Quote