Quote:
Originally Posted by macpablus
Now, I'm trying to insert an <hr> tag between the two, but I can't find the way. 
|
Well done
And a few new things to learn.
First of all: programmers are lazy. Always try to do as much as possible inside of loops.
To do this, we will use findAll instead of find to look for
all images in the page. The good thing is, that the second parameter (attrs) accepts lists of values.
Code:
images = soup.findAll('div', attrs={'id':['rudy_paz', 'rep']})
This code will find all divs with id either 'rudy_paz' or 'rep'. Cool. Now we have a list of images, if len(images) > 0. (The len operator counts the number of elements inside a list.)
Now we have a list which we may iterate over, using
Code:
for image in images:
<do something with variable image>
To add new elements the soup offers the method insert along with the class Tag.
To create a new Tag you call something like hr = Tag(soup, "tr"). This creates a <hr></hr>. To add this to the soup at a certain position you may call soup.body.insert(0, hr). But because programmers are lazy they will call something like
Code:
soup.body.insert(0, Tag(soup, "hr"))
Now we have everything together to do all you wanted. Try to link this stuff, with the old image.extract() and so on. In case of trouble, you may look at the spoiler
Spoiler:
Code:
def postprocess_html(self, soup, first):
# Added by a.peter:
# Try to find the divs containing images
images = soup.findAll('div', attrs={'id':['rudy_paz', 'rep']})
# if there are images
if len(images) > 0:
# extract them from the soup
for image in images:
image.extract()
# clear the body tag by removing all unneeded elements
while len(soup.body) > 0:
soup.body.next.extract()
# add all images an a <hr/>
for image in images:
soup.body.insert(0, image)
soup.body.insert(0, Tag(soup, "hr"))
# there is one <hr/> to much so we remove it
soup.find('hr').extract()
return soup
for table in soup.findAll('table', align='right'):
img = table.find('img')
if img is not None:
img.extract()
caption = self.tag_to_string(table).strip()
div = Tag(soup, 'div')
div['style'] = 'text-align:center'
div.insert(0, img)
div.insert(1, Tag(soup, 'br'))
if caption:
div.insert(2, NavigableString(caption))
table.replaceWith(div)
return soup