View Single Post
Old 03-12-2011, 02:00 PM   #20
clintiepoo
Member
clintiepoo began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle
Quote:
Originally Posted by Starson17 View Post

Post your code. It should have worked.
Here are my tags. I'm working on the img and the fn.

Code:
    keep_only_tags = [ 
                        dict(name='h1'),
                        dict(name='span', attrs={'class':'updated'}),
                        dict(name='span', attrs={'class':'fn'}),
                        dict(name='img', attrs={'id':'img-holder'}),
                        dict(name='span', attrs={'id':'gallery-cutline'}),
                        dict(name='div', attrs={'id':'blox-story-text'})

                     ]
These tags are in order, so the previous sibling thing gets a little more confusing. I was trying to insert the fn, then the image. The fn tag works, but the image gets lost.

Code:
    def preprocess_html(self,soup):
#        print 'the soup is: ', soup
        for fn_tag in soup.findAll("span", {"class" : "fn"}):
            previousSibling_tag = fn_tag.previousSibling
            if previousSibling_tag.name == 'span':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,fn_tag)
                previousSibling_tag.insert(1,new_tag)
        for img_tag in soup.findAll('img'):
            previousSibling_tag = img_tag.previousSibling
            if previousSibling_tag.name == 'span':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,img_tag)
                previousSibling_tag.insert(2,new_tag)                
                
                
        return soup
clintiepoo is offline   Reply With Quote