View Single Post
Old 01-03-2012, 11:36 PM   #8
clintiepoo
Member
clintiepoo began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle
vtblogger,

That worked very well, thank you! The last problem I'm running into is I have one more link that I want to keep. I think it should be easy enough using an elseif or something, but I seriously struggle with this stuff. I appreciate your help this far.

Inside of this mess, I would like to keep the text of the author's name. Before, I was just keeping that span.

keep_only_tags ... dict(name='span', attrs={'class':'fn'}),

Now, this gets deleted with the a.extract()


Code:
                <a href="/search/?l=50&sd=desc&s=start_time&f=html&byline=By KURT ERICKSON, JG-TC Springfield Bureau">
                    <span class="author vcard"><span class="fn">By KURT ERICKSON, JG-TC Springfield Bureau</span></span>
                </a>
                <span class="hide source-org vcard"><span class="org fn">JG-TC.com</span></span>
This is my flawed code... any more ideas? Did I mention I appreciated your help!?!

for a in soup.findAll('a'):
img = a.find('img')
fn = a.find('fn')
if img is not None:
a.replaceWith(img)
else:
if fn is not None:
a.replaceWith(fn)
else:
a.extract()
clintiepoo is offline   Reply With Quote