MobileRead Forums - View Single Post

oneillpt · 03-01-2011, 03:17 PM

Certainly a minority linguistic interest, there being no Finnish news included yet with Calibre, but may also be of use as an example to anyone encountering problems with new recipes due to HTML tables in the feed content.

Helsingin Sanomat places the feed content within HTML <table> tags. Without the "'linearize_tables' : True" conversions_options below this would result in an e-book in mobi format which shows a single page only for each article both on Kindle and in the MobiPocket reader for PC, losing the rest of each article after the part which fits on that first page.

The recipe also illustrates handling of printable page versions (the "tulosta" below) where the RSS feeds supply the page URL needed in two different forms, with or without a "?ref=rss" at the end.

Code:

class AdvancedUserRecipe1298137661(BasicNewsRecipe):
  title          = u'Helsingin Sanomat'
  oldest_article = 7
  max_articles_per_feed = 100
  no_stylesheets = True
  remove_javascript     = True
  conversion_options = {
                         'linearize_tables' : True 
                       }
  remove_tags = [
                  dict(name='a', attrs={'id':'articleCommentUrl'}),
                  dict(name='p', attrs={'class':'newsSummary'}),
                  dict(name='div', attrs={'class':'headerTools'})
                ]

  feeds          = [(u'Uutiset - HS.fi', u'http://www.hs.fi/uutiset/rss/'), (u'Politiikka - HS.fi', u'http://www.hs.fi/politiikka/rss/'),
                     (u'Ulkomaat - HS.fi', u'http://www.hs.fi/ulkomaat/rss/'), (u'Kulttuuri - HS.fi', u'http://www.hs.fi/kulttuuri/rss/'),
                     (u'Kirjat - HS.fi', u'http://www.hs.fi/kulttuuri/kirjat/rss/'), (u'Elokuvat - HS.fi', u'http://www.hs.fi/kulttuuri/elokuvat/rss/')
                     ]

  def print_version(self, url):
    j = url.rfind("/")
    s = url[j:]
    i = s.rfind("?ref=rss")
    if i > 0:
      s = s[:i]
    return "http://www.hs.fi/tulosta" + s

03-01-2011, 03:17 PM	#1
oneillpt Connoisseur Posts: 63 Karma: 46 Join Date: Feb 2011 Device: Kindle 3 (cracked screen!); PW1; Oasis	Recipe for Helsingin Sanomat Certainly a minority linguistic interest, there being no Finnish news included yet with Calibre, but may also be of use as an example to anyone encountering problems with new recipes due to HTML tables in the feed content. Helsingin Sanomat places the feed content within HTML <table> tags. Without the "'linearize_tables' : True" conversions_options below this would result in an e-book in mobi format which shows a single page only for each article both on Kindle and in the MobiPocket reader for PC, losing the rest of each article after the part which fits on that first page. The recipe also illustrates handling of printable page versions (the "tulosta" below) where the RSS feeds supply the page URL needed in two different forms, with or without a "?ref=rss" at the end. Code: class AdvancedUserRecipe1298137661(BasicNewsRecipe): title = u'Helsingin Sanomat' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True remove_javascript = True conversion_options = { 'linearize_tables' : True } remove_tags = [ dict(name='a', attrs={'id':'articleCommentUrl'}), dict(name='p', attrs={'class':'newsSummary'}), dict(name='div', attrs={'class':'headerTools'}) ] feeds = [(u'Uutiset - HS.fi', u'http://www.hs.fi/uutiset/rss/'), (u'Politiikka - HS.fi', u'http://www.hs.fi/politiikka/rss/'), (u'Ulkomaat - HS.fi', u'http://www.hs.fi/ulkomaat/rss/'), (u'Kulttuuri - HS.fi', u'http://www.hs.fi/kulttuuri/rss/'), (u'Kirjat - HS.fi', u'http://www.hs.fi/kulttuuri/kirjat/rss/'), (u'Elokuvat - HS.fi', u'http://www.hs.fi/kulttuuri/elokuvat/rss/') ] def print_version(self, url): j = url.rfind("/") s = url[j:] i = s.rfind("?ref=rss") if i > 0: s = s[:i] return "http://www.hs.fi/tulosta" + s