View Single Post
Old 03-30-2013, 09:18 AM   #8
leo738
Enthusiast
leo738 began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Hello All,

I had a look at some of the links & it's possible to get the recipe working, but it's not as extensive as the previous version, missing the magazine & lots of other sections. It's a shame but at least it's something. The only sections are now:
  1. News
  2. Business
  3. Debate
  4. Life Style
  5. Culture
  6. Sport

I notice the links contain numbers at the end which may be subject to change, will have to wait & see!


Here's the recipe:

Code:
__license__  = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Modified by O. O'H"
'''
irishtimes.com
'''
import re

from calibre.web.feeds.news import BasicNewsRecipe

class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    encoding  = 'ISO-8859-15'
    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan, Phil Burns & O. O'H"
    language = 'en_IE'
    timefmt = ' (%A, %B %d, %Y)'

    oldest_article = 1.0
    max_articles_per_feed  = 100
    no_stylesheets = True
    simultaneous_downloads= 5

    r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*')
    keep_only_tags  = dict(name='article', attrs={'class':'article row'})
    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
    extra_css      = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt  }'

    feeds          = [
 		  			  ('News', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('Business', 'http://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'),
                      ('Debate', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('Life Style', 'http://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'),
                      ('Culture', 'http://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'),
                      ('Sport', 'http://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'),
                    ]

    def print_version(self, url):
        if url.count('rss.feedsportal.com'):
            #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
            u = url.find('irishtimes')
            u = 'http://www.irishtimes.com' + url[u + 12:]
            u = u.replace('0C', '/')
            u = u.replace('A', '')
            u = u.replace('0Bhtml/story01.htm', '_pf.html')
        else:
            u = url.replace('.html','_pf.html')
        return u

    def get_article_url(self, article):
        return article.link
leo738 is offline   Reply With Quote