vtblogger,
That worked very well, thank you! The last problem I'm running into is I have one more link that I want to keep. I think it should be easy enough using an elseif or something, but I seriously struggle with this stuff. I appreciate your help this far.
Inside of this mess, I would like to keep the text of the author's name. Before, I was just keeping that span.
keep_only_tags ... dict(name='span', attrs={'class':'fn'}),
Now, this gets deleted with the a.extract()
Code:
<a href="/search/?l=50&sd=desc&s=start_time&f=html&byline=By KURT ERICKSON, JG-TC Springfield Bureau">
<span class="author vcard"><span class="fn">By KURT ERICKSON, JG-TC Springfield Bureau</span></span>
</a>
<span class="hide source-org vcard"><span class="org fn">JG-TC.com</span></span>
This is my flawed code... any more ideas? Did I mention I appreciated your help!?!
for a in soup.findAll('a'):
img = a.find('img')
fn = a.find('fn')
if img is not None:
a.replaceWith(img)
else:
if fn is not None:
a.replaceWith(fn)
else:
a.extract()