View Single Post
Old 03-12-2013, 08:38 AM   #2
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by leo738 View Post
Hello All,

The Irish Times website has recently been updated over the last weekend & following that the recipe seems to be broken. Anybody come up with an update?

Thanks,

Leo
The following are the essential changes to get content extracted again:
Code:
encoding  = 'UTF-8'
instead of
Code:
encoding  = 'ISO-8859-15'
Code:
keep_only_tags  = dict(name='article', attrs={'class':'article row'})
instead of any existing keep_only_tags

Code:
remove_tags    = [dict(name='div', attrs={'class':'topics_holder'}),
                  dict(name='div', attrs={'class':'social_article_share'})]
instead of any existing remove_tags.

I'm not posting a complete recipe - mine is rather heavily customised to extract only new articles, but extract all on one chosen day each week.

It looks as if there may be some further changes needed related to the chosen feeds, and I'll add another post here if I find further changes needed, but the changes above should get things going again for now.
oneillpt is offline   Reply With Quote