Certainly a minority linguistic interest, there being no Finnish news included yet with Calibre, but may also be of use as an example to anyone encountering problems with new recipes due to HTML tables in the feed content.
Helsingin Sanomat places the feed content within HTML <table> tags. Without the "'linearize_tables' : True" conversions_options below this would result in an e-book in mobi format which shows a single page only for each article both on Kindle and in the MobiPocket reader for PC, losing the rest of each article after the part which fits on that first page.
The recipe also illustrates handling of printable page versions (the "tulosta" below) where the RSS feeds supply the page URL needed in two different forms, with or without a "?ref=rss" at the end.
Code:
class AdvancedUserRecipe1298137661(BasicNewsRecipe):
title = u'Helsingin Sanomat'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_javascript = True
conversion_options = {
'linearize_tables' : True
}
remove_tags = [
dict(name='a', attrs={'id':'articleCommentUrl'}),
dict(name='p', attrs={'class':'newsSummary'}),
dict(name='div', attrs={'class':'headerTools'})
]
feeds = [(u'Uutiset - HS.fi', u'http://www.hs.fi/uutiset/rss/'), (u'Politiikka - HS.fi', u'http://www.hs.fi/politiikka/rss/'),
(u'Ulkomaat - HS.fi', u'http://www.hs.fi/ulkomaat/rss/'), (u'Kulttuuri - HS.fi', u'http://www.hs.fi/kulttuuri/rss/'),
(u'Kirjat - HS.fi', u'http://www.hs.fi/kulttuuri/kirjat/rss/'), (u'Elokuvat - HS.fi', u'http://www.hs.fi/kulttuuri/elokuvat/rss/')
]
def print_version(self, url):
j = url.rfind("/")
s = url[j:]
i = s.rfind("?ref=rss")
if i > 0:
s = s[:i]
return "http://www.hs.fi/tulosta" + s