Quote:
Originally Posted by TBR
I'm still having trouble to get a recipe for
http://p.yimg.com/bw/rss/nachrichten/bundeswehr.xml
cleared of unnecessary clutter, am still getting artifacts.
The modified basic news recipe works in principle and removes much of the clutter but still includes, among others, a "ghost" of an add:
Could anyone jump in with advice?
|
This is what you should put in your recipe for complete cleanup:
Code:
remove_attributes = ['width','height']
remove_tags_before = dict(name='h1')
remove_tags_after = dict(name='div',attrs={'class':'ynw-article-body mod'})
remove_tags = [
dict(attrs={'id':['ynw-image-video-inset','ynw-more-news']})
,dict(attrs={'class':['ynw-utility']})
]