Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-22-2014, 03:18 AM   #1
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
daily express update

image tags changed

Spoiler:

Code:
import re

from calibre.web.feeds.news import BasicNewsRecipe
from calibre import browser
class AdvancedUserRecipe1390132023(BasicNewsRecipe):
    title          = u'Daily Express'
    __author__ = 'Dave Asbury'
   # 21.11.14 written due to website changes
    oldest_article = 1
    max_articles_per_feed = 10
    compress_news_images = True
    compress_news_images_max_size = 30
    ignore_duplicate_articles = {'title', 'url'}
    masthead_url = 'http://cdn.images.dailyexpress.co.uk/img/page/express_logo.png'
    auto_cleanup_keep = '//*[@class="author"]|//section[@class="photo changeSpace"]'
    auto_cleanup = True
    no_stylesheets        = False
    
    preprocess_regexps = [
		 (re.compile(r'\| [\w].+?\| [\w].+?\| Daily Express', re.IGNORECASE | re.DOTALL), lambda match: ''),
         	
         		]
    feeds          = [

		(u'UK News', u'http://www.express.co.uk/posts/rss/1/uk'),
                                (u'World News',u'http://www.express.co.uk/posts/rss/78/world'),
                                (u'Finance',u'http://www.express.co.uk/posts/rss/21/finance'),
                                (u'Sport',u'http://www.express.co.uk/posts/rss/65/sport'),
                                (u'Entertainment',u'http://www.express.co.uk/posts/rss/18/entertainment'),
                                (u'Lifestyle',u'http://www.express.co.uk/posts/rss/8/life&style'),
                                (u'Fun',u'http://www.express.co.uk/posts/rss/110/fun'),
                        ]
    
    def get_cover_url(self):
        print '============Cover ================='
        print
        soup = self.index_to_soup('http://www.express.co.uk/ourpaper/')
        cov = soup.find(attrs={'src' : re.compile('http://cdn.images.express.co.uk/img/covers/')})
        cov=str(cov)
        print '^^^^^^^', cov
        cov2 =  re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', cov)

        cov=str(cov2)
        cov=cov[2:len(cov)-2]

        print '&&&&&&&&',cov,'***'
        #cover_url=cov
        br = browser()
        br.set_handle_redirect(False)
        try:
            br.open_novisit(cov)
            cover_url = cov
        except:
            cover_url ='http://cdn.images.express.co.uk/img/static/ourpaper/header-back-issue-papers.jpg'

        return cover_url


    extra_css = '''
                    #h1{font-weight:bold;font-size:175%;}
                    h2{display: block;margin-left: auto;margin-right: auto;width:100%;font-weight:bold;font-size:175%;}
                    #p{font-size:14px;}
                    #body{font-size:14px;}
                    .newsCaption {display: block;margin-left: auto;margin-right: auto;width:100%;font-size:40%;}
                    .publish-info {font-size:50%;}
                    .photo img {display: block;margin-left: auto;margin-right: auto;width:100%;}
      '''
scissors is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Daily Express Recipe update 16.11.13 scissors Recipes 0 11-16-2013 07:10 AM
Daily Express 9/9/13 scissors Recipes 0 09-09-2013 03:06 PM
The Daily Express scissors Recipes 1 08-17-2013 08:26 AM
New Musical Express update 9/6/12 scissors Recipes 0 06-09-2012 07:53 AM
update daily mirror uk scissors Recipes 0 02-11-2012 09:27 AM


All times are GMT -4. The time now is 05:20 AM.


MobileRead.com is a privately owned, operated and funded community.