View Single Post
Old 02-23-2011, 11:45 PM   #5
clintiepoo
Member
clintiepoo began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle
Starson,

Thanks for your help on this. The code, however, hasn't worked for me. I think it's in how I'm chopping up the HTML. Here are my keep tags:

Code:
    keep_only_tags = [ 
                        dict(name='h1'),
                        dict(name='span', attrs={'class':'updated'}),
                        dict(name='img', attrs={'id':'img-holder'}),
                        dict(name='span', attrs={'id':'gallery-cutline'}),                        
                        dict(name='div', attrs={'id':'blox-story-text'}) 
                     ]
Because I'm calling out the img directly, and other things as 'span,' I don't know that the code you gave has the flexibility to work with this. It's running fine, just not doing anything for me. This is my suspicion.

Would you mind to look at http://www.herald-review.com/news/lo...cc4c002e0.html for example and see if you can come up with something better on the tags? I'd like to get rid of the spans, but I don't see how.
clintiepoo is offline   Reply With Quote