Quote:
Originally Posted by leo738
Hello All,
The Irish Times website has recently been updated over the last weekend & following that the recipe seems to be broken. Anybody come up with an update?
Thanks,
Leo
|
The following are the essential changes to get content extracted again:
instead of
Code:
encoding = 'ISO-8859-15'
Code:
keep_only_tags = dict(name='article', attrs={'class':'article row'})
instead of any existing keep_only_tags
Code:
remove_tags = [dict(name='div', attrs={'class':'topics_holder'}),
dict(name='div', attrs={'class':'social_article_share'})]
instead of any existing remove_tags.
I'm not posting a complete recipe - mine is rather heavily customised to extract only new articles, but extract all on one chosen day each week.
It looks as if there may be some further changes needed related to the chosen feeds, and I'll add another post here if I find further changes needed, but the changes above should get things going again for now.