View Single Post
Old 10-14-2011, 10:48 AM   #3
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by Tragos View Post
This recipe is not working anymore as Helsingin Sanomat has changed their website structure. Nowadays the print versions of the pages are created using JavaScript.
Here is a revised version, which extracts the main news (Uutiset) section. However, the book (Kirjat) and cinema (Elokuvat) sections, which were still being extracted by the original version are broken by this revision.

Spoiler:
Code:
class AdvancedUserRecipe1298137661(BasicNewsRecipe):
  title          = u'Helsingin Sanomat'
  __author__ = 'oneillpt custom'
  language              = 'fi'
  oldest_article = 7
  max_articles_per_feed = 100
  no_stylesheets = True
  remove_javascript     = True
  conversion_options = {
                         'linearize_tables' : True 
                       }
  #remove_tags = [
  #                dict(name='a', attrs={'id':'articleCommentUrl'}),
  #                dict(name='p', attrs={'class':'newsSummary'}),
  #                dict(name='div', attrs={'class':'headerTools'})
  #              ]
  keep_only_tags = [dict(name='div', attrs={'id':'main-content'})]

  feeds          = [(u'Uutiset - HS.fi', u'http://www.hs.fi/uutiset/rss/')
#, (u'Politiikka - HS.fi', u'http://www.hs.fi/politiikka/rss/'),
#                     (u'Ulkomaat - HS.fi', u'http://www.hs.fi/ulkomaat/rss/'), #(u'Kulttuuri - HS.fi', u'http://www.hs.fi/kulttuuri/rss/'),
#                     (u'Kirjat - HS.fi', u'http://www.hs.fi/kulttuuri/kirjat/rss/'), #(u'Elokuvat - HS.fi', u'http://www.hs.fi/kulttuuri/elokuvat/rss/')
                     ]

  #def print_version(self, url):
  #  j = url.rfind("/")
  #  s = url[j:]
  #  i = s.rfind("?ref=rss")
  #  if i > 0:
  #    s = s[:i]
  #  return "http://www.hs.fi/tulosta" + s


The revision is made by removing the remove_tags lines, adding a keep_only_tags line, and removing the print_version definition. I have retained the removed lines as comments, and commented the feeds which are not working now. I'll post a new version if I can make these feeds work with the same recipe which now works for the main news feed.
oneillpt is offline   Reply With Quote