![]() |
#1 |
Enthusiast
![]() Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
|
Updated Irish Times recipe?
Hello All,
The Irish Times website has recently been updated over the last weekend & following that the recipe seems to be broken. Anybody come up with an update? Thanks, Leo |
![]() |
![]() |
![]() |
#2 | |
Connoisseur
![]() Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
Quote:
Code:
encoding = 'UTF-8' Code:
encoding = 'ISO-8859-15' Code:
keep_only_tags = dict(name='article', attrs={'class':'article row'}) Code:
remove_tags = [dict(name='div', attrs={'class':'topics_holder'}), dict(name='div', attrs={'class':'social_article_share'})] I'm not posting a complete recipe - mine is rather heavily customised to extract only new articles, but extract all on one chosen day each week. It looks as if there may be some further changes needed related to the chosen feeds, and I'll add another post here if I find further changes needed, but the changes above should get things going again for now. |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Connoisseur
![]() Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
Quote:
I no longer see a set of RSS feeds listed as before. These may now be in the process of being phased out in favour of RSS feeds tied to the subscription ePaper - the "Quick User Guide" for the "Newspaper replica view" on the Subscription/Epaper page has an item "Click on [icon] to create an RSS feed to the front page or entire newspaper". With home delivery of the printed paper already I'm not going to subscribe to the ePaper as well. If I find a stable set of feeds which continue to work in Calibre, I'll post again on this thread. Otherwise it will be a case of availing of the offer of temporary ePaper subscription in place of home delivery when on holiday, which I hope well continue. |
|
![]() |
![]() |
![]() |
#4 |
Enthusiast
![]() Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
|
Yes, indeed I've tried your suggested fix (full recipe below) but unfortunately it's still unusable. I see from an article on the Irish Times website they are still tweaking it.
However I wonder will a fix be possible? Leo Code:
__license__ = 'GPL v3' __copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns" ''' irishtimes.com ''' import re from calibre.web.feeds.news import BasicNewsRecipe class IrishTimes(BasicNewsRecipe): title = u'The Irish Times' encoding = 'ISO-8859-15' __author__ = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns" language = 'en_IE' timefmt = ' (%A, %B %d, %Y)' oldest_article = 1.0 max_articles_per_feed = 100 no_stylesheets = True simultaneous_downloads= 5 r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*') remove_tags = [dict(name='div', attrs={'class':'footer'})] extra_css = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt }' feeds = [ ('Frontpage', 'http://www.irishtimes.com/feeds/rss/newspaper/index.rss'), ('Ireland', 'http://www.irishtimes.com/feeds/rss/newspaper/ireland.rss'), ('World', 'http://www.irishtimes.com/feeds/rss/newspaper/world.rss'), ('Finance', 'http://www.irishtimes.com/feeds/rss/newspaper/finance.rss'), ('Features', 'http://www.irishtimes.com/feeds/rss/newspaper/features.rss'), ('Sport', 'http://www.irishtimes.com/feeds/rss/newspaper/sport.rss'), ('Opinion', 'http://www.irishtimes.com/feeds/rss/newspaper/opinion.rss'), ('Letters', 'http://www.irishtimes.com/feeds/rss/newspaper/letters.rss'), ('Magazine', 'http://www.irishtimes.com/feeds/rss/newspaper/magazine.rss'), ('Health', 'http://www.irishtimes.com/feeds/rss/newspaper/health.rss'), ('Education & Parenting', 'http://www.irishtimes.com/feeds/rss/newspaper/education.rss'), ('Motors', 'http://www.irishtimes.com/feeds/rss/newspaper/motors.rss'), ('An Teanga Bheo', 'http://www.irishtimes.com/feeds/rss/newspaper/anteangabheo.rss'), ('Commercial Property', 'http://www.irishtimes.com/feeds/rss/newspaper/commercialproperty.rss'), ('Science Today', 'http://www.irishtimes.com/feeds/rss/newspaper/sciencetoday.rss'), ('Property', 'http://www.irishtimes.com/feeds/rss/newspaper/property.rss'), ('The Tickets', 'http://www.irishtimes.com/feeds/rss/newspaper/theticket.rss'), ('Weekend', 'http://www.irishtimes.com/feeds/rss/newspaper/weekend.rss'), ('News features', 'http://www.irishtimes.com/feeds/rss/newspaper/newsfeatures.rss'), ('Obituaries', 'http://www.irishtimes.com/feeds/rss/newspaper/obituaries.rss'), ] def print_version(self, url): if url.count('rss.feedsportal.com'): #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm') u = url.find('irishtimes') u = 'http://www.irishtimes.com' + url[u + 12:] u = u.replace('0C', '/') u = u.replace('A', '') u = u.replace('0Bhtml/story01.htm', '_pf.html') else: u = url.replace('.html','_pf.html') return u def get_article_url(self, article): return article.link |
![]() |
![]() |
![]() |
#5 | |||
Member
![]() Posts: 14
Karma: 10
Join Date: May 2011
Device: Kindle
|
Where is the recipe?
Quote:
Is that really where it's kept? Or should there be a disk file for irish_times? Quote:
Quote:
Thanks for the pointers! |
|||
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Member
![]() Posts: 14
Karma: 10
Join Date: May 2011
Device: Kindle
|
Quote:
However, given the Irish news industry's ignorance of the Internet, and linking in particular, I wouldn't hold out too much hope that they will actually expose feeds for much longer, as they don't seem to want people to link to them. ///Peter |
|
![]() |
![]() |
![]() |
#7 |
Enthusiast
![]() Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
|
Extract from:
http://oldbugs.calibre-ebook.com/wiki/RecipeTips NOTE: you are strongly advised NOT to edit the built-in recipes directly from the recipes folder! The second method is the recommended one and here is how you go about it. In the main window of calibre click the little arrow next to the "Fetch News" button and then click on "Add a custom news source". A new window opens up and on the bottom left corner click on "Customize builtin recipe". Now a little window opens up with a drop down box where you can pick the recipe of the news scource you wish to customize. Once you have chosen a particular news source it should appear in the list on the left column of the window. Select it in the left column and the recipe will appear on the right column of the window. |
![]() |
![]() |
![]() |
#8 |
Enthusiast
![]() Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
|
Hello All,
I had a look at some of the links & it's possible to get the recipe working, but it's not as extensive as the previous version, missing the magazine & lots of other sections. It's a shame but at least it's something. The only sections are now:
I notice the links contain numbers at the end which may be subject to change, will have to wait & see! Here's the recipe: Code:
__license__ = 'GPL v3' __copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Modified by O. O'H" ''' irishtimes.com ''' import re from calibre.web.feeds.news import BasicNewsRecipe class IrishTimes(BasicNewsRecipe): title = u'The Irish Times' encoding = 'ISO-8859-15' __author__ = "Derry FitzGerald, Ray Kinsella, David O'Callaghan, Phil Burns & O. O'H" language = 'en_IE' timefmt = ' (%A, %B %d, %Y)' oldest_article = 1.0 max_articles_per_feed = 100 no_stylesheets = True simultaneous_downloads= 5 r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*') keep_only_tags = dict(name='article', attrs={'class':'article row'}) remove_tags = [dict(name='div', attrs={'class':'footer'})] extra_css = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt }' feeds = [ ('News', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'), ('Business', 'http://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'), ('Debate', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'), ('Life Style', 'http://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'), ('Culture', 'http://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'), ('Sport', 'http://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'), ] def print_version(self, url): if url.count('rss.feedsportal.com'): #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm') u = url.find('irishtimes') u = 'http://www.irishtimes.com' + url[u + 12:] u = u.replace('0C', '/') u = u.replace('A', '') u = u.replace('0Bhtml/story01.htm', '_pf.html') else: u = url.replace('.html','_pf.html') return u def get_article_url(self, article): return article.link |
![]() |
![]() |
![]() |
#9 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Mar 2013
Device: Kindle
|
Thanks very much for the update. Just some feedback for you.
I've tried it with the Kindle 3 and within stories, apostrophes and quote marks get screwed up. They get replaced by a combination of â and then two question marks, each in a box. The á in Tánaiste also gets screwed up, but á doesn't get printed much, and anyway, it's only the Tánaiste. |
![]() |
![]() |
![]() |
#10 |
Enthusiast
![]() Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
|
Hello,
I stand to be corrected but I think it's something to do with the encoding: Code:
encoding = 'ISO-8859-15' You might try instead: Code:
encoding = 'UTF-8' Code:
__license__ = 'GPL v3' __copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Modified by O. O'H" ''' irishtimes.com ''' import re from calibre.web.feeds.news import BasicNewsRecipe class IrishTimes(BasicNewsRecipe): title = u'The Irish Times' encoding = 'UTF-8' __author__ = "Derry FitzGerald, Ray Kinsella, David O'Callaghan, Phil Burns & O. O'H" language = 'en_IE' timefmt = ' (%A, %B %d, %Y)' oldest_article = 1.0 max_articles_per_feed = 100 no_stylesheets = True simultaneous_downloads= 5 r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*') keep_only_tags = dict(name='article', attrs={'class':'article row'}) remove_tags = [dict(name='div', attrs={'class':'footer'})] extra_css = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt }' feeds = [ ('News', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'), ('Business', 'http://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'), ('Debate', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'), ('Life Style', 'http://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'), ('Culture', 'http://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'), ('Sport', 'http://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'), ] def print_version(self, url): if url.count('rss.feedsportal.com'): #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm') u = url.find('irishtimes') u = 'http://www.irishtimes.com' + url[u + 12:] u = u.replace('0C', '/') u = u.replace('A', '') u = u.replace('0Bhtml/story01.htm', '_pf.html') else: u = url.replace('.html','_pf.html') return u def get_article_url(self, article): return article.link Last edited by leo738; 04-01-2013 at 07:20 AM. Reason: Change |
![]() |
![]() |
![]() |
#11 |
Enthusiast
![]() Posts: 39
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
|
Looks like the photos & headlines could do with being resized, anybody know a solution?
Leo |
![]() |
![]() |
![]() |
Tags |
irish times, recipe |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Irish Times - Recipe Problem | leo738 | Recipes | 10 | 08-31-2011 12:15 PM |
Irish Times Recipe problem | mbro | Recipes | 3 | 04-16-2011 08:11 AM |
Modified Irish Times Recipe | phiznlil | Recipes | 2 | 04-01-2011 06:27 AM |
Updated New York Times recipe | nickredding | Recipes | 2 | 11-20-2010 10:53 AM |
Irish Times recipe - no longer working | patrickpc | Recipes | 1 | 11-17-2010 12:16 PM |