View Single Post
Old 08-31-2010, 08:43 PM   #2584
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
Here is what I gave you last time. Why doesn't that work?
Spoiler:
Code:
        for item in soup.findAll('h2'):
            link = item.find('a')
            if link:


In line 1 it finds all the <h2> tags.
In line 2 it looks at each one to decide if there is an <a> tag inside.
In line 3, if there was an <a> tag found, it proceeds to do what needs to be done (look at the code I gave you again).
I looked at the http://www.laineygossip.com/ page and it seems to have the same structure, with <a> tags (having the link you want) inside <h2> tags.
i think your missing what I was trying to ask or I asked it wrong. Yes your code works fine even on this page yet there is an exception which is what I'm having an issue with. The part that you mentioned with the for item in soup.findAll('h2') works great and I actually got that working fine. What my issue is is the first part of that where some of the articles are not within that structure. I will continue to work at it and see what I can come up with it. I really wanted to figure this one out using what you have taught me.

this is the part that is throwing me note it is not in the <h2> and <a> like the rest of the page is. I hope that explains what I mean. Hope I'm not bugging you on this. If so just say so and i'll chill

Spoiler:

Code:
<div class="artIntroShort">																						
			<p><span class="adpad300hp"><script src="http://ad.ca.doubleclick.net/adj/upt.laineygossip.home;tile=1;sz=300x250;ord={0}?" language="JavaScript1.1"></script><noscript>&lt;A HREF="http://ad.ca.doubleclick.net/jump/upt.laineygossip.home;tile=1;sz=300x250;ord={0}?" TARGET="_blank"&gt;&lt;IMG SRC="http://ad.ca.doubleclick.net/ad/upt.laineygossip.home;tile=1;sz=300x250;ord={0}?" BORDER="0" WIDTH="300" HEIGHT="250" ALT="Click Here" /&gt;&lt;/A&gt;</noscript></span>Dear Gossips,<br><br>Sorry to be a buzzkill but I think it’s the end of summer. Those science people may say it’s officially September 21 to mark the equinox but symbolically, for most of us, it’s really the start of school, even when we’re not in school. Or the Venice Film Festival when the stars get back to work, leading straight into TIFF and the VMAs and Fashion Week and then the fall movie schedule which is really when the jostling begins. That’s tomorrow, and it brings to end the slow season of celebrity. <br><br>Like clockwork then, Vanity Fair is releasing excerpts from their Lindsay Lohan exclusive and tabloid Wednesday tomorrow should be even bullsh-ttier than usual. <a href="/intro_31aug10.aspx?CatID=0&amp;CelID=0">Full Intro</a></p>											
			<p></p>
			<p class="comment">Posted at 6:53 AM</p>
		    </div>
TonytheBookworm is offline