View Single Post
Old 09-25-2010, 10:33 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by JohnsonZA View Post
I'm using keep_only_tags and remove_tags in a recipe for a site using divs with multiple classes like so:
Code:
<div class="article right">blah</div>
I'm struggling to keep the div based on its "article" class.

This is what I've tried, but it doesn't work:
Code:
keep_only_tags = [
    dict(name='div', attrs={'class':['article']})
]
It only works if I put the both class names like so:
Code:
keep_only_tags = [
    dict(name='div', attrs={'class':['article right']})
]
I've tried wildcards ('article.*'), but that doesn't seem to work either.

Any one have ideas?
The class is "article right," so it's working as it should. If you want to use a regex on the class:
Code:
    keep_only_tags = [
                dict(name='div', attrs={'class':re.compile(r'article', re.DOTALL|re.IGNORECASE)}) 
                ]
and don't forget to use:
import re
Starson17 is offline   Reply With Quote