MobileRead Forums - View Single Post

oneillpt · 03-12-2013, 08:38 AM

Quote:

Originally Posted by leo738

Hello All,

The Irish Times website has recently been updated over the last weekend & following that the recipe seems to be broken. Anybody come up with an update?

Thanks,

Leo

The following are the essential changes to get content extracted again:

Code:

encoding  = 'UTF-8'

instead of

Code:

encoding  = 'ISO-8859-15'

Code:

keep_only_tags  = dict(name='article', attrs={'class':'article row'})

instead of any existing keep_only_tags

Code:

remove_tags    = [dict(name='div', attrs={'class':'topics_holder'}),
                  dict(name='div', attrs={'class':'social_article_share'})]

instead of any existing remove_tags.

I'm not posting a complete recipe - mine is rather heavily customised to extract only new articles, but extract all on one chosen day each week.

It looks as if there may be some further changes needed related to the chosen feeds, and I'll add another post here if I find further changes needed, but the changes above should get things going again for now.