View Single Post
Old 03-01-2011, 03:17 PM   #1
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Recipe for Helsingin Sanomat

Certainly a minority linguistic interest, there being no Finnish news included yet with Calibre, but may also be of use as an example to anyone encountering problems with new recipes due to HTML tables in the feed content.

Helsingin Sanomat places the feed content within HTML <table> tags. Without the "'linearize_tables' : True" conversions_options below this would result in an e-book in mobi format which shows a single page only for each article both on Kindle and in the MobiPocket reader for PC, losing the rest of each article after the part which fits on that first page.

The recipe also illustrates handling of printable page versions (the "tulosta" below) where the RSS feeds supply the page URL needed in two different forms, with or without a "?ref=rss" at the end.


Code:
class AdvancedUserRecipe1298137661(BasicNewsRecipe):
  title          = u'Helsingin Sanomat'
  oldest_article = 7
  max_articles_per_feed = 100
  no_stylesheets = True
  remove_javascript     = True
  conversion_options = {
                         'linearize_tables' : True 
                       }
  remove_tags = [
                  dict(name='a', attrs={'id':'articleCommentUrl'}),
                  dict(name='p', attrs={'class':'newsSummary'}),
                  dict(name='div', attrs={'class':'headerTools'})
                ]

  feeds          = [(u'Uutiset - HS.fi', u'http://www.hs.fi/uutiset/rss/'), (u'Politiikka - HS.fi', u'http://www.hs.fi/politiikka/rss/'),
                     (u'Ulkomaat - HS.fi', u'http://www.hs.fi/ulkomaat/rss/'), (u'Kulttuuri - HS.fi', u'http://www.hs.fi/kulttuuri/rss/'),
                     (u'Kirjat - HS.fi', u'http://www.hs.fi/kulttuuri/kirjat/rss/'), (u'Elokuvat - HS.fi', u'http://www.hs.fi/kulttuuri/elokuvat/rss/')
                     ]

  def print_version(self, url):
    j = url.rfind("/")
    s = url[j:]
    i = s.rfind("?ref=rss")
    if i > 0:
      s = s[:i]
    return "http://www.hs.fi/tulosta" + s
oneillpt is offline   Reply With Quote