View Single Post
Old 03-05-2011, 09:47 AM   #15
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by clintiepoo View Post
I can understand what you're saying, but I can't seem to make it work for my website. I wouldn't be offended if you just went ahead and made it work.
Give a fish versus teach to fish.

Besides, If I just do it, I have to actually make it work, whereas if I stick to telling you how to do it yourself, I can always hide behind the claim it's your fault it didn't work.
Quote:
I'm trying something like this.

Code:
    def preprocess_html(self,soup):
        for pix in soup.findAll('img'):
            new_tag=tag(soup,'p')
            new_tag.insert(0,pix)
            pix.replaceWith(new_tag)
        return soup
In my mind, this should first find all the images.
Yes, it does.

Quote:
Next, it inserts a <p> tag around the img, returning:

<p><img id="img-holder" src="xyz.jpg" alt=" " width="300px"></p>
Correct.
Quote:
Finally, it takes all of this, and uses it in place of what the img before.
Nope. In my long description:
"Now he's got a disconnected div tag with the p tag and img tag inside it. At this point, the img tag (and its image) are no longer on the page. He's going to have to put it back. "
In your case, you created a "disconnected" p tag as a new tag, then removed the img from your page. How can you "uses it in place of what the img before." if the page no longer has the img tag on it? It's now in the new p tag. You need a reference to a tag that's still on the page.

Quote:
Unfortunately, it doesn't work at all.
Correct
Quote:
I understand it's not part of the page anymore, but I figured it would stick it on the bottom or something.
Nope. You've lost the reference inside the page.

Quote:
Is there a way to step through this code and watch variables?
Yes, but it's not worth the effort. Just add this print statement:
Code:
print 'My variable x is now: ', x

Quote:
I am still thinking the problem is that my img's parent is the body. I don't know how to fix that.
You don't have to use the body. You can use next or previous or next sibling, etc. The only reason the code I posted used the parent was because the author needed to keep a placeholder inside the page, and the parent was still there when the img was removed, but the parent had nothing of value in it, so it could be replaced. Yes, you have to use something on the page. You can't replace the entire parent, (unless you grab everything else you need). You could try inserting into the parent.

Quote:
Here's an example what's around the img:
I already looked at it, so you don't need to post it. Your problem is simple. One option is to put a placeholder into the page with next or previous or nextSibling so you can do a replaceWith or insert into it. Another option is to replace something that you don't need and that's still in the body. Still another option is to just label the parent (body tag) then do an insert (of your p tag with the extracted img) at a numerical position into the body. One of those should work. (I tested one and it worked, but the img was out of order, and I'm not at that machine now.)

Last edited by Starson17; 03-05-2011 at 09:52 AM.
Starson17 is offline   Reply With Quote