Quote:
Originally Posted by JohnsonZA
I'm using keep_only_tags and remove_tags in a recipe for a site using divs with multiple classes like so:
Code:
<div class="article right">blah</div>
I'm struggling to keep the div based on its "article" class.
This is what I've tried, but it doesn't work:
Code:
keep_only_tags = [
dict(name='div', attrs={'class':['article']})
]
It only works if I put the both class names like so:
Code:
keep_only_tags = [
dict(name='div', attrs={'class':['article right']})
]
I've tried wildcards ('article.*'), but that doesn't seem to work either.
Any one have ideas?
|
The class is "article right," so it's working as it should. If you want to use a regex on the class:
Code:
keep_only_tags = [
dict(name='div', attrs={'class':re.compile(r'article', re.DOTALL|re.IGNORECASE)})
]
and don't forget to use:
import re