fixing some tags and removing unnecessary banners
https://github.com/kovidgoyal/calibr...00db42e010677d
Code:
remove_attributes = ['style','height','width']
ignore_duplicate_articles = {'url'}
keep_only_tags = [
classes('heading-part full-details')
]
remove_tags = [
dict(name='nav', attrs={'class':'ie-breadcrumb'}),
dict(name='div', attrs={'id':'ie_story_comments'}),
dict(name='div', attrs={'class':['ie-int-campign-ad','custom_read_button','unitimg','copyright']}),
dict(name='img', attrs={'src':'https://images.indianexpress.com/2021/06/explained-button-300-ie.jpeg'}),
dict(name='a', attrs={'href':'https://indianexpress.com/section/explained/?utm_source=newbanner'}),
dict(name='img', attrs={'src':'https://images.indianexpress.com/2021/06/opinion-button-300-ie.jpeg'}),
dict(name='a', attrs={'href':'https://indianexpress.com/section/opinion/?utm_source=newbanner'}),
classes('share-social appstext storytags pdsc-related-modify news-guard'),