Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-31-2011, 08:07 PM   #1
phiznlil
Member
phiznlil began at the beginning.
 
Posts: 16
Karma: 12
Join Date: Mar 2011
Device: kindle 3
Modified Irish Times Recipe

Here is an improvement to The Irish Times Recipe.

Added all the weekly feeds so weekly articles are also in the paper like the amazon edition, also removed all the line spaces and indent the paragraphs again like the amazon edition.

added charset ISO-8859-15, is the correct? It seems to work best with all the fadas (acute accents) in Irish and the euro symbol.

Code:
__license__   = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns"
'''
irishtimes.com
'''
import re

from calibre.web.feeds.news import BasicNewsRecipe

class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    encoding  = 'ISO-8859-15'
    __author__     = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns"
    language = 'en_IE'
    timefmt = ' (%A, %B %d, %Y)'


    oldest_article = 1.0
    max_articles_per_feed  = 100
    no_stylesheets = True
    simultaneous_downloads= 5

    r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*')
    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
    extra_css      = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt  }'

    feeds          = [
                      ('Frontpage', 'http://www.irishtimes.com/feeds/rss/newspaper/index.rss'),
                      ('Ireland', 'http://www.irishtimes.com/feeds/rss/newspaper/ireland.rss'),
                      ('World', 'http://www.irishtimes.com/feeds/rss/newspaper/world.rss'),
                      ('Finance', 'http://www.irishtimes.com/feeds/rss/newspaper/finance.rss'),
                      ('Features', 'http://www.irishtimes.com/feeds/rss/newspaper/features.rss'),
                      ('Sport', 'http://www.irishtimes.com/feeds/rss/newspaper/sport.rss'),
                      ('Opinion', 'http://www.irishtimes.com/feeds/rss/newspaper/opinion.rss'),
                      ('Letters', 'http://www.irishtimes.com/feeds/rss/newspaper/letters.rss'),
                      ('Magazine', 'http://www.irishtimes.com/feeds/rss/newspaper/magazine.rss'),
                      ('Health', 'http://www.irishtimes.com/feeds/rss/newspaper/health.rss'),
                      ('Education & Parenting', 'http://www.irishtimes.com/feeds/rss/newspaper/education.rss'),
                      ('Motors', 'http://www.irishtimes.com/feeds/rss/newspaper/motors.rss'),
                      ('An Teanga Bheo', 'http://www.irishtimes.com/feeds/rss/newspaper/anteangabheo.rss'),
                      ('Commercial Property', 'http://www.irishtimes.com/feeds/rss/newspaper/commercialproperty.rss'),
                      ('Science Today', 'http://www.irishtimes.com/feeds/rss/newspaper/sciencetoday.rss'),
                      ('Property', 'http://www.irishtimes.com/feeds/rss/newspaper/property.rss'),
                      ('The Tickets', 'http://www.irishtimes.com/feeds/rss/newspaper/theticket.rss'),
                      ('Weekend', 'http://www.irishtimes.com/feeds/rss/newspaper/weekend.rss'),
                      ('News features', 'http://www.irishtimes.com/feeds/rss/newspaper/newsfeatures.rss'),
                      ('Obituaries', 'http://www.irishtimes.com/feeds/rss/newspaper/obituaries.rss'),
                    ]


    def print_version(self, url):
        if url.count('rss.feedsportal.com'):
            u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
        else:
            u = url.replace('.html','_pf.html')
        return u

    def get_article_url(self, article):
        return article.link
phiznlil is offline   Reply With Quote
Old 03-31-2011, 10:44 PM   #2
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 54
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1
Quote:
Originally Posted by phiznlil View Post
added charset ISO-8859-15, is the correct? It seems to work best with all the fadas (acute accents) in Irish and the euro symbol.
The encoding does not appear to be critical for the Irish Times recipe, as the potential encoding problem which might arise with the Euro symbol (vowels with fadas would be encoded identically with ISO-8859-15, ISO-8859-1 or Windows-1252 in any case) are avoided by using HTML entities, "& #8364;" for the Euro symbol, "& #237;" for i fada, etc. (without the space between & and #, inserted here to display the entity rather than the resulting character)

The Irish Times pages are declared as us-ascii, which implies 7-bit ASCII codes, and which, if followed consistently, would require all these "special" characters to be encoded as HTML entities. I'm not sure whether the entities defined extend to traditional script, but the séimhiú can be handled with roman letters, for example "& #7682;" for Ḃ (B séimhiú), "& #7683" for ḃ (b séimhiú), etc.

If 7-bit ASCII encoding with HTML entities for all characters not included among the 7-bit set is followed consistently it should not matter whether the encoding is specified as ISO-8859-15, ISO-8859-1, Windows-1252 or UTF-8, as all these encodings are identical for the 7-bit codes. Have you noticed any instances where the Euro symbol or any letters with fadas (or any other special character for that matter) appeared incorrectly?
oneillpt is offline   Reply With Quote
 
Advertisement
Old 04-01-2011, 06:27 AM   #3
phiznlil
Member
phiznlil began at the beginning.
 
Posts: 16
Karma: 12
Join Date: Mar 2011
Device: kindle 3
Hmm, no I haven't actually noticed problems that I can recall with incorrect characters and I did notice the pages were encoded us-ascii, but I also noticed the news-feeds themselves when subscribed to have encoding=utf-8 in the header and was a little confused. I simply defaulted to ISO-8859-15 out of habit.

So anyway, as you say it shouldn't matter as all the characters in the articles do seem to be entitised, thanks for clearing that up oneillpt

Last edited by phiznlil; 04-01-2011 at 06:34 AM.
phiznlil is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help - New York Times Recipe brutalist Recipes 6 03-20-2011 10:17 PM
NY Times Recipe Changes bcollier Recipes 1 03-04-2011 11:52 AM
Irish Times recipe - no longer working patrickpc Recipes 1 11-17-2010 12:16 PM
NY Times recipe -- request? maxbookworm Calibre 2 07-21-2010 04:07 PM
Irish Times Newspaper Crash.. Boyodublin Calibre 1 12-03-2008 01:08 PM


All times are GMT -4. The time now is 11:54 AM.


MobileRead.com is a privately owned, operated and funded community.