Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-12-2013, 06:24 AM   #1
leo738
Enthusiast
leo738 began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Updated Irish Times recipe?

Hello All,

The Irish Times website has recently been updated over the last weekend & following that the recipe seems to be broken. Anybody come up with an update?

Thanks,

Leo
leo738 is offline   Reply With Quote
Old 03-12-2013, 08:38 AM   #2
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by leo738 View Post
Hello All,

The Irish Times website has recently been updated over the last weekend & following that the recipe seems to be broken. Anybody come up with an update?

Thanks,

Leo
The following are the essential changes to get content extracted again:
Code:
encoding  = 'UTF-8'
instead of
Code:
encoding  = 'ISO-8859-15'
Code:
keep_only_tags  = dict(name='article', attrs={'class':'article row'})
instead of any existing keep_only_tags

Code:
remove_tags    = [dict(name='div', attrs={'class':'topics_holder'}),
                  dict(name='div', attrs={'class':'social_article_share'})]
instead of any existing remove_tags.

I'm not posting a complete recipe - mine is rather heavily customised to extract only new articles, but extract all on one chosen day each week.

It looks as if there may be some further changes needed related to the chosen feeds, and I'll add another post here if I find further changes needed, but the changes above should get things going again for now.
oneillpt is offline   Reply With Quote
Old 03-12-2013, 11:07 AM   #3
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by oneillpt View Post
It looks as if there may be some further changes needed related to the chosen feeds.
The old RSS feeds in some cases now are redirected to new feeds, and in other cases simply fail. "frontpage", "ireland" and "world" are all redirected to a "news" feed, which for me only extracts when redirected from "frontpage". "finance", "features", "sport" and "opinion" seem to extract still. "letters" redirects to a new "Debate" feed, which no longer contains the letters, and the redirection does not seem to extract (I viewed the feed in browser).

I no longer see a set of RSS feeds listed as before. These may now be in the process of being phased out in favour of RSS feeds tied to the subscription ePaper - the "Quick User Guide" for the "Newspaper replica view" on the Subscription/Epaper page has an item "Click on [icon] to create an RSS feed to the front page or entire newspaper".

With home delivery of the printed paper already I'm not going to subscribe to the ePaper as well. If I find a stable set of feeds which continue to work in Calibre, I'll post again on this thread. Otherwise it will be a case of availing of the offer of temporary ePaper subscription in place of home delivery when on holiday, which I hope well continue.
oneillpt is offline   Reply With Quote
Old 03-13-2013, 07:27 AM   #4
leo738
Enthusiast
leo738 began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Yes, indeed I've tried your suggested fix (full recipe below) but unfortunately it's still unusable. I see from an article on the Irish Times website they are still tweaking it.

However I wonder will a fix be possible?

Leo



Code:
__license__  = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns"
'''
irishtimes.com
'''
import re

from calibre.web.feeds.news import BasicNewsRecipe

class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    encoding  = 'ISO-8859-15'
    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns"
    language = 'en_IE'
    timefmt = ' (%A, %B %d, %Y)'


    oldest_article = 1.0
    max_articles_per_feed  = 100
    no_stylesheets = True
    simultaneous_downloads= 5

    r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*')
    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
    extra_css      = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt  }'

    feeds          = [
                      ('Frontpage', 'http://www.irishtimes.com/feeds/rss/newspaper/index.rss'),
                      ('Ireland', 'http://www.irishtimes.com/feeds/rss/newspaper/ireland.rss'),
                      ('World', 'http://www.irishtimes.com/feeds/rss/newspaper/world.rss'),
                      ('Finance', 'http://www.irishtimes.com/feeds/rss/newspaper/finance.rss'),
                      ('Features', 'http://www.irishtimes.com/feeds/rss/newspaper/features.rss'),
                      ('Sport', 'http://www.irishtimes.com/feeds/rss/newspaper/sport.rss'),
                      ('Opinion', 'http://www.irishtimes.com/feeds/rss/newspaper/opinion.rss'),
                      ('Letters', 'http://www.irishtimes.com/feeds/rss/newspaper/letters.rss'),
                      ('Magazine', 'http://www.irishtimes.com/feeds/rss/newspaper/magazine.rss'),
                      ('Health', 'http://www.irishtimes.com/feeds/rss/newspaper/health.rss'),
                      ('Education & Parenting', 'http://www.irishtimes.com/feeds/rss/newspaper/education.rss'),
                      ('Motors', 'http://www.irishtimes.com/feeds/rss/newspaper/motors.rss'),
                      ('An Teanga Bheo', 'http://www.irishtimes.com/feeds/rss/newspaper/anteangabheo.rss'),
                      ('Commercial Property', 'http://www.irishtimes.com/feeds/rss/newspaper/commercialproperty.rss'),
                      ('Science Today', 'http://www.irishtimes.com/feeds/rss/newspaper/sciencetoday.rss'),
                      ('Property', 'http://www.irishtimes.com/feeds/rss/newspaper/property.rss'),
                      ('The Tickets', 'http://www.irishtimes.com/feeds/rss/newspaper/theticket.rss'),
                      ('Weekend', 'http://www.irishtimes.com/feeds/rss/newspaper/weekend.rss'),
                      ('News features', 'http://www.irishtimes.com/feeds/rss/newspaper/newsfeatures.rss'),
                      ('Obituaries', 'http://www.irishtimes.com/feeds/rss/newspaper/obituaries.rss'),
                    ]


    def print_version(self, url):
        if url.count('rss.feedsportal.com'):
            #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
            u = url.find('irishtimes')
            u = 'http://www.irishtimes.com' + url[u + 12:]
            u = u.replace('0C', '/')
            u = u.replace('A', '')
            u = u.replace('0Bhtml/story01.htm', '_pf.html')
        else:
            u = url.replace('.html','_pf.html')
        return u

    def get_article_url(self, article):
        return article.link
leo738 is offline   Reply With Quote
Old 03-13-2013, 06:21 PM   #5
frisket
Member
frisket began at the beginning.
 
Posts: 14
Karma: 10
Join Date: May 2011
Device: Kindle
Where is the recipe?

Quote:
Originally Posted by oneillpt View Post
The following are the essential changes to get content extracted again:
The only place I can find the recipe is in /opt/calibre/resources/builtin_recipes.zip

Is that really where it's kept? Or should there be a disk file for irish_times?
Quote:
Originally Posted by oneillpt View Post
Code:
encoding  = 'UTF-8'
instead of
Code:
encoding  = 'ISO-8859-15'
Code:
keep_only_tags  = dict(name='article', attrs={'class':'article row'})
instead of any existing keep_only_tags
I didn't find any keep_only_tags.

Quote:
Originally Posted by oneillpt View Post
Code:
remove_tags    = [dict(name='div', attrs={'class':'topics_holder'}),
                  dict(name='div', attrs={'class':'social_article_share'})]
instead of any existing remove_tags.
That was too easy :-)

Thanks for the pointers!
frisket is offline   Reply With Quote
Old 03-13-2013, 06:37 PM   #6
frisket
Member
frisket began at the beginning.
 
Posts: 14
Karma: 10
Join Date: May 2011
Device: Kindle
Quote:
Originally Posted by leo738 View Post
Yes, indeed I've tried your suggested fix (full recipe below) but unfortunately it's still unusable. I see from an article on the Irish Times website they are still tweaking it.

However I wonder will a fix be possible?
It sounds as if we need to wait until it settles down. I went through the log file of an attempt just now, using the fixes oneillpt posted, and there are dozens of broken links (RSS feeds that no longer exist). It should, with some effort, be possible to identify them by inspection, and find the equivalent (or not) on the new web site.

However, given the Irish news industry's ignorance of the Internet, and linking in particular, I wouldn't hold out too much hope that they will actually expose feeds for much longer, as they don't seem to want people to link to them.

///Peter
frisket is offline   Reply With Quote
Old 03-14-2013, 06:46 AM   #7
leo738
Enthusiast
leo738 began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Extract from:

http://oldbugs.calibre-ebook.com/wiki/RecipeTips



NOTE: you are strongly advised NOT to edit the built-in recipes directly from the recipes folder!

The second method is the recommended one and here is how you go about it.

In the main window of calibre click the little arrow next to the "Fetch News" button and then click on "Add a custom news source".
A new window opens up and on the bottom left corner click on "Customize builtin recipe".
Now a little window opens up with a drop down box where you can pick the recipe of the news scource you wish to customize.
Once you have chosen a particular news source it should appear in the list on the left column of the window.
Select it in the left column and the recipe will appear on the right column of the window.
leo738 is offline   Reply With Quote
Old 03-30-2013, 09:18 AM   #8
leo738
Enthusiast
leo738 began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Hello All,

I had a look at some of the links & it's possible to get the recipe working, but it's not as extensive as the previous version, missing the magazine & lots of other sections. It's a shame but at least it's something. The only sections are now:
  1. News
  2. Business
  3. Debate
  4. Life Style
  5. Culture
  6. Sport

I notice the links contain numbers at the end which may be subject to change, will have to wait & see!


Here's the recipe:

Code:
__license__  = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Modified by O. O'H"
'''
irishtimes.com
'''
import re

from calibre.web.feeds.news import BasicNewsRecipe

class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    encoding  = 'ISO-8859-15'
    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan, Phil Burns & O. O'H"
    language = 'en_IE'
    timefmt = ' (%A, %B %d, %Y)'

    oldest_article = 1.0
    max_articles_per_feed  = 100
    no_stylesheets = True
    simultaneous_downloads= 5

    r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*')
    keep_only_tags  = dict(name='article', attrs={'class':'article row'})
    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
    extra_css      = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt  }'

    feeds          = [
 		  			  ('News', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('Business', 'http://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'),
                      ('Debate', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('Life Style', 'http://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'),
                      ('Culture', 'http://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'),
                      ('Sport', 'http://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'),
                    ]

    def print_version(self, url):
        if url.count('rss.feedsportal.com'):
            #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
            u = url.find('irishtimes')
            u = 'http://www.irishtimes.com' + url[u + 12:]
            u = u.replace('0C', '/')
            u = u.replace('A', '')
            u = u.replace('0Bhtml/story01.htm', '_pf.html')
        else:
            u = url.replace('.html','_pf.html')
        return u

    def get_article_url(self, article):
        return article.link
leo738 is offline   Reply With Quote
Old 03-30-2013, 10:03 AM   #9
smooth
Junior Member
smooth began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2013
Device: Kindle
Thanks very much for the update. Just some feedback for you.

I've tried it with the Kindle 3 and within stories, apostrophes and quote marks get screwed up. They get replaced by a combination of â and then two question marks, each in a box.

The á in Tánaiste also gets screwed up, but á doesn't get printed much, and anyway, it's only the Tánaiste.
smooth is offline   Reply With Quote
Old 04-01-2013, 07:14 AM   #10
leo738
Enthusiast
leo738 began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Hello,

I stand to be corrected but I think it's something to do with the encoding:

Code:
encoding  = 'ISO-8859-15'

You might try instead:
Code:
encoding  = 'UTF-8'
As per the 2nd post. Hopefully it solves the problem. The recipe following this change would therefore be:

Code:
__license__  = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Modified by O. O'H"
'''
irishtimes.com
'''
import re

from calibre.web.feeds.news import BasicNewsRecipe

class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    encoding  = 'UTF-8'
    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan, Phil Burns & O. O'H"
    language = 'en_IE'
    timefmt = ' (%A, %B %d, %Y)'

    oldest_article = 1.0
    max_articles_per_feed  = 100
    no_stylesheets = True
    simultaneous_downloads= 5

    r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*')
    keep_only_tags  = dict(name='article', attrs={'class':'article row'})
    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
    extra_css      = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt  }'

    feeds          = [
 		      ('News', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('Business', 'http://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'),
                      ('Debate', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
                      ('Life Style', 'http://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'),
                      ('Culture', 'http://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'),
                      ('Sport', 'http://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'),
                    ]

    def print_version(self, url):
        if url.count('rss.feedsportal.com'):
            #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
            u = url.find('irishtimes')
            u = 'http://www.irishtimes.com' + url[u + 12:]
            u = u.replace('0C', '/')
            u = u.replace('A', '')
            u = u.replace('0Bhtml/story01.htm', '_pf.html')
        else:
            u = url.replace('.html','_pf.html')
        return u

    def get_article_url(self, article):
        return article.link

Last edited by leo738; 04-01-2013 at 07:20 AM. Reason: Change
leo738 is offline   Reply With Quote
Old 04-01-2013, 08:13 AM   #11
leo738
Enthusiast
leo738 began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Looks like the photos & headlines could do with being resized, anybody know a solution?

Leo
leo738 is offline   Reply With Quote
Reply

Tags
irish times, recipe


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Irish Times - Recipe Problem leo738 Recipes 10 08-31-2011 12:15 PM
Irish Times Recipe problem mbro Recipes 3 04-16-2011 08:11 AM
Modified Irish Times Recipe phiznlil Recipes 2 04-01-2011 06:27 AM
Updated New York Times recipe nickredding Recipes 2 11-20-2010 10:53 AM
Irish Times recipe - no longer working patrickpc Recipes 1 11-17-2010 12:16 PM


All times are GMT -4. The time now is 07:01 PM.


MobileRead.com is a privately owned, operated and funded community.