MobileRead Forums - View Single Post - Having problems using h1 tag since website changed

clintiepoo · 01-03-2012, 11:36 PM

vtblogger,

That worked very well, thank you! The last problem I'm running into is I have one more link that I want to keep. I think it should be easy enough using an elseif or something, but I seriously struggle with this stuff. I appreciate your help this far.

Inside of this mess, I would like to keep the text of the author's name. Before, I was just keeping that span.

keep_only_tags ... dict(name='span', attrs={'class':'fn'}),

Now, this gets deleted with the a.extract()

Code:

                <a href="/search/?l=50&sd=desc&s=start_time&f=html&byline=By KURT ERICKSON, JG-TC Springfield Bureau">
                    <span class="author vcard"><span class="fn">By KURT ERICKSON, JG-TC Springfield Bureau</span></span>
                </a>
                <span class="hide source-org vcard"><span class="org fn">JG-TC.com</span></span>

This is my flawed code... any more ideas? Did I mention I appreciated your help!?!

for a in soup.findAll('a'):
img = a.find('img')
fn = a.find('fn')
if img is not None:
a.replaceWith(img)
else:
if fn is not None:
a.replaceWith(fn)
else:
a.extract()

01-03-2012, 11:36 PM	#8
clintiepoo Member Posts: 19 Karma: 10 Join Date: Feb 2011 Device: kindle	vtblogger, That worked very well, thank you! The last problem I'm running into is I have one more link that I want to keep. I think it should be easy enough using an elseif or something, but I seriously struggle with this stuff. I appreciate your help this far. Inside of this mess, I would like to keep the text of the author's name. Before, I was just keeping that span. keep_only_tags ... dict(name='span', attrs={'class':'fn'}), Now, this gets deleted with the a.extract() Code: <a href="/search/?l=50&sd=desc&s=start_time&f=html&byline=By KURT ERICKSON, JG-TC Springfield Bureau"> <span class="author vcard"><span class="fn">By KURT ERICKSON, JG-TC Springfield Bureau</span></span> </a> <span class="hide source-org vcard"><span class="org fn">JG-TC.com</span></span> This is my flawed code... any more ideas? Did I mention I appreciated your help!?! for a in soup.findAll('a'): img = a.find('img') fn = a.find('fn') if img is not None: a.replaceWith(img) else: if fn is not None: a.replaceWith(fn) else: a.extract()