MobileRead Forums - View Single Post

leo738 · 03-30-2013, 09:18 AM

Hello All,

I had a look at some of the links & it's possible to get the recipe working, but it's not as extensive as the previous version, missing the magazine & lots of other sections. It's a shame but at least it's something. The only sections are now:

News
Business
Debate
Life Style
Culture
Sport

I notice the links contain numbers at the end which may be subject to change, will have to wait & see!

Here's the recipe:

Code:

__license__  = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Modified by O. O'H"
'''
irishtimes.com
'''
import re

from calibre.web.feeds.news import BasicNewsRecipe

class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    encoding  = 'ISO-8859-15'
    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan, Phil Burns & O. O'H"
    language = 'en_IE'
    timefmt = ' (%A, %B %d, %Y)'

    oldest_article = 1.0
    max_articles_per_feed  = 100
    no_stylesheets = True
    simultaneous_downloads= 5

    r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*')
    keep_only_tags  = dict(name='article', attrs={'class':'article row'})
    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
    extra_css      = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt  }'

    feeds          = [
 		  			  ('News', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('Business', 'http://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'),
                      ('Debate', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('Life Style', 'http://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'),
                      ('Culture', 'http://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'),
                      ('Sport', 'http://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'),
                    ]

    def print_version(self, url):
        if url.count('rss.feedsportal.com'):
            #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
            u = url.find('irishtimes')
            u = 'http://www.irishtimes.com' + url[u + 12:]
            u = u.replace('0C', '/')
            u = u.replace('A', '')
            u = u.replace('0Bhtml/story01.htm', '_pf.html')
        else:
            u = url.replace('.html','_pf.html')
        return u

    def get_article_url(self, article):
        return article.link