03-31-2011, 08:07 PM | #1 |
Member
Posts: 16
Karma: 12
Join Date: Mar 2011
Device: kindle 3
|
Modified Irish Times Recipe
Here is an improvement to The Irish Times Recipe.
Added all the weekly feeds so weekly articles are also in the paper like the amazon edition, also removed all the line spaces and indent the paragraphs again like the amazon edition. added charset ISO-8859-15, is the correct? It seems to work best with all the fadas (acute accents) in Irish and the euro symbol. Code:
__license__ = 'GPL v3' __copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns" ''' irishtimes.com ''' import re from calibre.web.feeds.news import BasicNewsRecipe class IrishTimes(BasicNewsRecipe): title = u'The Irish Times' encoding = 'ISO-8859-15' __author__ = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns" language = 'en_IE' timefmt = ' (%A, %B %d, %Y)' oldest_article = 1.0 max_articles_per_feed = 100 no_stylesheets = True simultaneous_downloads= 5 r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*') remove_tags = [dict(name='div', attrs={'class':'footer'})] extra_css = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt }' feeds = [ ('Frontpage', 'http://www.irishtimes.com/feeds/rss/newspaper/index.rss'), ('Ireland', 'http://www.irishtimes.com/feeds/rss/newspaper/ireland.rss'), ('World', 'http://www.irishtimes.com/feeds/rss/newspaper/world.rss'), ('Finance', 'http://www.irishtimes.com/feeds/rss/newspaper/finance.rss'), ('Features', 'http://www.irishtimes.com/feeds/rss/newspaper/features.rss'), ('Sport', 'http://www.irishtimes.com/feeds/rss/newspaper/sport.rss'), ('Opinion', 'http://www.irishtimes.com/feeds/rss/newspaper/opinion.rss'), ('Letters', 'http://www.irishtimes.com/feeds/rss/newspaper/letters.rss'), ('Magazine', 'http://www.irishtimes.com/feeds/rss/newspaper/magazine.rss'), ('Health', 'http://www.irishtimes.com/feeds/rss/newspaper/health.rss'), ('Education & Parenting', 'http://www.irishtimes.com/feeds/rss/newspaper/education.rss'), ('Motors', 'http://www.irishtimes.com/feeds/rss/newspaper/motors.rss'), ('An Teanga Bheo', 'http://www.irishtimes.com/feeds/rss/newspaper/anteangabheo.rss'), ('Commercial Property', 'http://www.irishtimes.com/feeds/rss/newspaper/commercialproperty.rss'), ('Science Today', 'http://www.irishtimes.com/feeds/rss/newspaper/sciencetoday.rss'), ('Property', 'http://www.irishtimes.com/feeds/rss/newspaper/property.rss'), ('The Tickets', 'http://www.irishtimes.com/feeds/rss/newspaper/theticket.rss'), ('Weekend', 'http://www.irishtimes.com/feeds/rss/newspaper/weekend.rss'), ('News features', 'http://www.irishtimes.com/feeds/rss/newspaper/newsfeatures.rss'), ('Obituaries', 'http://www.irishtimes.com/feeds/rss/newspaper/obituaries.rss'), ] def print_version(self, url): if url.count('rss.feedsportal.com'): u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm') else: u = url.replace('.html','_pf.html') return u def get_article_url(self, article): return article.link |
03-31-2011, 10:44 PM | #2 | |
Connoisseur
Posts: 62
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
Quote:
The Irish Times pages are declared as us-ascii, which implies 7-bit ASCII codes, and which, if followed consistently, would require all these "special" characters to be encoded as HTML entities. I'm not sure whether the entities defined extend to traditional script, but the séimhiú can be handled with roman letters, for example "& #7682;" for Ḃ (B séimhiú), "& #7683" for ḃ (b séimhiú), etc. If 7-bit ASCII encoding with HTML entities for all characters not included among the 7-bit set is followed consistently it should not matter whether the encoding is specified as ISO-8859-15, ISO-8859-1, Windows-1252 or UTF-8, as all these encodings are identical for the 7-bit codes. Have you noticed any instances where the Euro symbol or any letters with fadas (or any other special character for that matter) appeared incorrectly? |
|
Advert | |
|
04-01-2011, 06:27 AM | #3 |
Member
Posts: 16
Karma: 12
Join Date: Mar 2011
Device: kindle 3
|
Hmm, no I haven't actually noticed problems that I can recall with incorrect characters and I did notice the pages were encoded us-ascii, but I also noticed the news-feeds themselves when subscribed to have encoding=utf-8 in the header and was a little confused. I simply defaulted to ISO-8859-15 out of habit.
So anyway, as you say it shouldn't matter as all the characters in the articles do seem to be entitised, thanks for clearing that up oneillpt Last edited by phiznlil; 04-01-2011 at 06:34 AM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help - New York Times Recipe | brutalist | Recipes | 6 | 03-20-2011 10:17 PM |
NY Times Recipe Changes | bcollier | Recipes | 1 | 03-04-2011 11:52 AM |
Irish Times recipe - no longer working | patrickpc | Recipes | 1 | 11-17-2010 12:16 PM |
NY Times recipe -- request? | maxbookworm | Calibre | 2 | 07-21-2010 04:07 PM |
Irish Times Newspaper Crash.. | Boyodublin | Calibre | 1 | 12-03-2008 01:08 PM |