Starson,
Thanks for your help on this. The code, however, hasn't worked for me. I think it's in how I'm chopping up the HTML. Here are my keep tags:
Code:
keep_only_tags = [
dict(name='h1'),
dict(name='span', attrs={'class':'updated'}),
dict(name='img', attrs={'id':'img-holder'}),
dict(name='span', attrs={'id':'gallery-cutline'}),
dict(name='div', attrs={'id':'blox-story-text'})
]
Because I'm calling out the img directly, and other things as 'span,' I don't know that the code you gave has the flexibility to work with this. It's running fine, just not doing anything for me. This is my suspicion.
Would you mind to look at
http://www.herald-review.com/news/lo...cc4c002e0.html for example and see if you can come up with something better on the tags? I'd like to get rid of the spans, but I don't see how.