View Single Post
Old 09-27-2010, 10:25 PM   #9
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
You're close. Note that "esbId=EDW72077" is the player ID. The player ID is in the iframe part of the page you're scraping. Here's code grabbed from a print in the recipe:
Code:
<iframe src="/widget/playercard?esbId=NOR780922&amp;season=2010&amp;gameId=" id="pcard-EOCVFPSS" frameborder="0"></iframe>
You just build the URL, grab the soup with:
Code:
soup = self.index_to_soup(URL)
then put it into your soup of the page where you want it.
1 question
1) I found the iframe
Code:
<div class="articleText"> <p>CHICAGO -- The Bears say they will hold defensive tackle <a href="/players/tommieharris/profile?id=HAR548445">Tommie Harris</a> out of Monday night's game against the <a href="/teams/greenbaypackers/profile?team=GB">Green Bay Packers</a> on a coach's decision.</p> <p>
<div class="pcard-wrapper  nfl-tag-right" id="pcard-JMEDKDWV-wrapper">
<iframe src="/widget/playercard?esbId=HAR548445&amp;season=2010&amp;gameId=" id="pcard-JMEDKDWV" frameborder="0"></iframe>
</div>
1) you said build the url, then put it into the soup wherever i want it. Can you point me to a recipe that does this or enlighten me ? I might have even doing it in the past but i'm having memory lapse if i have. thanks

something like this maybe? :confused
Spoiler:

Code:
def preprocess_html(self, soup):
        for item in soup.findAll(attrs={'style':True}):
            del item['style']
        print ' FIRST SOUP is: ', soup
        for pcard in soup.findAll(name='div', attrs={'class':'pcard-wrapper  nfl-tag-right'}):
            widget = pcard.find('iframe')
            print 'HEY W: ', widget
            pcard_url = widget.src
            print 'HERES THE PCARD_URL', pcard_url
            URL = 'http://www.nfl.com' + pcard_url
            newsoup = self.index_to_soup(URL)
            print 'here is the new soup: ', newsoup
            soup.insert(0, newsoup) #no clue on this but maybe 
       
        return soup


Just not grasping this one yet

Last edited by TonytheBookworm; 09-27-2010 at 11:21 PM. Reason: still pluggin
TonytheBookworm is offline   Reply With Quote