Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-27-2015, 11:52 AM   #1
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
Daily Express

daily express (uk) RSS feeds have not altered since last monday.

quick feed43 fix for the recipe until(if) they come back

Spoiler:
Code:
import re

from calibre.web.feeds.news import BasicNewsRecipe
from calibre import browser
class AdvancedUserRecipe1390132023(BasicNewsRecipe):
    title          = u'Daily Express'
    __author__ = 'Dave Asbury'
   # 27.6.15 using feed43 as rss feeds dead
   # feed 43 string = <div {*}<a href="{%}"{*}<h4>{%}</h4>
    oldest_article = 1.5
    max_articles_per_feed = 10
    compress_news_images = True
    compress_news_images_max_size = 20
    ignore_duplicate_articles = {'title', 'url'}
    masthead_url = 'http://cdn.images.dailyexpress.co.uk/img/page/express_logo.png'
    auto_cleanup_keep = '//*[@class="author"]|//section[@class="photo changeSpace"]'
    auto_cleanup = True
    no_stylesheets        = False
    
    preprocess_regexps = [
		 (re.compile(r'\| [\w].+?\| [\w].+?\| Daily Express', re.IGNORECASE | re.DOTALL), lambda match: ''),
         	
         		]
    feeds          = [

		(u'UK News', u'http://feed43.com/3460616116055543.xml'),
        #http://feeds.feedburner.com/daily-express-uk-news'),#http://www.express.co.uk/posts/rss/1/uk'),
        (u'World News',u'http://feed43.com/5650105317448722.xml'),
        #http://www.express.co.uk/posts/rss/78/world'),
        (u'Showbiz News',u'http://feed43.com/2564008080442425.xml'),
        (u'Finance',u'http://feed43.com/8636615325246501.xml'),
        #http://www.express.co.uk/posts/rss/21/finance'),
        (u'Sport - Boxing',u'http://feed43.com/7570233481503246.xml'),
        (u'Sport - Rugby Union',u'http://feed43.com/4235483647118470.xml'),
        (u'Sport - Others',u'http://feed43.com/6106345668326737.xml'),
        #http://www.express.co.uk/posts/rss/65/sport'),
        (u'Entertainment',u'http://feed43.com/8864645080210731.xml'),
        #http://www.express.co.uk/posts/rss/18/entertainment'),
        (u'Lifestyle',u'http://feed43.com/8705161426770855.xml'),
        #http://www.express.co.uk/posts/rss/8/life&style'),
        (u'Travel',u'http://feed43.com/6547373884767554.xml'),
                        ]
    
    def get_cover_url(self):
        print '============Cover ================='
        print
        soup = self.index_to_soup('http://www.express.co.uk/ourpaper/')
        cov = soup.find(attrs={'src' : re.compile('http://cdn.images.express.co.uk/img/covers/')})
        cov=str(cov)
        print '^^^^^^^', cov
        cov2 =  re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', cov)

        cov=str(cov2)
        cov=cov[2:len(cov)-2]

        print '&&&&&&&&',cov,'***'
        #cover_url=cov
        br = browser()
        br.set_handle_redirect(False)
        try:
            br.open_novisit(cov)
            cover_url = cov
        except:
            cover_url ='http://cdn.images.express.co.uk/img/static/ourpaper/header-back-issue-papers.jpg'

        return cover_url


    extra_css = '''
                    #h1{font-weight:bold;font-size:175%;}
                    h2{display: block;margin-left: auto;margin-right: auto;width:100%;font-weight:bold;font-size:175%;}
                    #p{font-size:14px;}
                    #body{font-size:14px;}
                    .newsCaption {display: block;margin-left: auto;margin-right: auto;width:100%;font-size:40%;}
                    .publish-info {font-size:50%;}
                    .photo img {display: block;margin-left: auto;margin-right: auto;width:100%;}
      '''
scissors is offline   Reply With Quote
Old 08-02-2015, 03:11 AM   #2
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 241
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
update express 1/8/2015

Official RSS Feeds working again

Spoiler:
Code:
import re

from calibre.web.feeds.news import BasicNewsRecipe
from calibre import browser
class AdvancedUserRecipe1390132023(BasicNewsRecipe):
    title          = u'Daily Express'
    __author__ = 'Dave Asbury'
   #1.8.15 official feedburner feeds live again
   # 27.6.15 using feed43 as rss feeds dead
   # feed 43 string = <div {*}<a href="{%}"{*}<h4>{%}</h4>
    oldest_article = 1.5
    max_articles_per_feed = 10
    compress_news_images = True
    compress_news_images_max_size = 20
    ignore_duplicate_articles = {'title', 'url'}
    masthead_url = 'http://cdn.images.dailyexpress.co.uk/img/page/express_logo.png'
    auto_cleanup_keep = '//*[@class="author"]|//section[@class="photo changeSpace"]'
    auto_cleanup = True
    no_stylesheets        = False
    
    preprocess_regexps = [
		 (re.compile(r'\| [\w].+?\| [\w].+?\| Daily Express', re.IGNORECASE | re.DOTALL), lambda match: ''),
         	
         		]
    feeds          = [

		#(u'UK News', u'http://feed43.com/3460616116055543.xml'),
        (u'UK News',u'http://feeds.feedburner.com/daily-express-uk-news'),#http://www.express.co.uk/posts/rss/1/uk'),
        (u'World News',u'http://feeds.feedburner.com/daily-express-world-news'),
        #(u'World News',u'http://feed43.com/5650105317448722.xml'),
        #http://www.express.co.uk/posts/rss/78/world'),
        (u'Showbiz News',u'http://feeds.feedburner.com/daily-express-showbiz-news'),
        #(u'Showbiz News',u'http://feed43.com/2564008080442425.xml'),
        (u'Finance',u'http://feeds.feedburner.com/daily-express-finance-news'),
        #(u'Finance',u'http://feed43.com/8636615325246501.xml'),
        #http://www.express.co.uk/posts/rss/21/finance'),
        #(u'Sport - Boxing',u'http://feed43.com/7570233481503246.xml'),
         (u'Sport - Boxing',u'http://feeds.feedburner.com/daily-express-boxing-news'),
         (u'Sport - Rugby Union',u'http://feeds.feedburner.com/daily-express-rugby-union-news'),
        #(u'Sport - Rugby Union',u'http://feed43.com/4235483647118470.xml'),
        #(u'Sport - Others',u'http://feed43.com/6106345668326737.xml'),
        (u'Sport - Others',u'http://feeds.feedburner.com/daily-express-other-sport-news'),
        #http://www.express.co.uk/posts/rss/65/sport'),
        (u'Entertainment',u'http://feeds.feedburner.com/daily-express-entertainment-news'),
        #(u'Entertainment',u'http://feed43.com/8864645080210731.xml'),
        #http://www.express.co.uk/posts/rss/18/entertainment'),
        (u'Lifestyle',u'http://feeds.feedburner.com/daily-express-life-and-style-news'),
        #(u'Lifestyle',u'http://feed43.com/8705161426770855.xml'),
        #http://www.express.co.uk/posts/rss/8/life&style'),
        (u'Travel',u'http://feeds.feedburner.com/daily-express-travel'),
        #(u'Travel',u'http://feed43.com/6547373884767554.xml'),
                        ]
    # starsons code
    def parse_feeds (self): 
      feeds = BasicNewsRecipe.parse_feeds(self) 
      for feed in feeds:
        for article in feed.articles[:]:
          print 'article.title is: ', article.title
          if 'WATCH:' in article.title.upper():
            feed.articles.remove(article)
          
      return feeds
  
    def get_cover_url(self):
        print '============Cover ================='
        print
        soup = self.index_to_soup('http://www.express.co.uk/ourpaper/')
        cov = soup.find(attrs={'src' : re.compile('http://cdn.images.express.co.uk/img/covers/')})
        cov=str(cov)
        print '^^^^^^^', cov
        cov2 =  re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', cov)

        cov=str(cov2)
        cov=cov[2:len(cov)-2]

        print '&&&&&&&&',cov,'***'
        #cover_url=cov
        br = browser()
        br.set_handle_redirect(False)
        try:
            br.open_novisit(cov)
            cover_url = cov
        except:
            cover_url ='http://cdn.images.express.co.uk/img/static/ourpaper/header-back-issue-papers.jpg'

        return cover_url


    extra_css = '''
                    #h1{font-weight:bold;font-size:175%;}
                    h2{display: block;margin-left: auto;margin-right: auto;width:100%;font-weight:bold;font-size:175%;}
                    #p{font-size:14px;}
                    #body{font-size:14px;}
                    .newsCaption {display: block;margin-left: auto;margin-right: auto;width:100%;font-size:40%;}
                    .publish-info {font-size:50%;}
                    .photo img {display: block;margin-left: auto;margin-right: auto;width:100%;}
      '''
scissors is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
daily express update scissors Recipes 0 11-22-2014 03:18 AM
Daily Express Recipe update 16.11.13 scissors Recipes 0 11-16-2013 07:10 AM
Daily Express 9/9/13 scissors Recipes 0 09-09-2013 03:06 PM
The Daily Express scissors Recipes 1 08-17-2013 08:26 AM
Free (Kindle/Nook/Christianbook) Daily Light on the Daily Path [Devotional] ATDrake Deals and Resources (No Self-Promotion or Affiliate Links) 4 04-20-2012 02:48 PM


All times are GMT -4. The time now is 01:14 PM.


MobileRead.com is a privately owned, operated and funded community.