Thanks TonytheBookworm for helping me with this. The script seems to work for now but like you said, just when I need an url with double quotes to try, I could not find one.
Well, good news, while writing this I found out that the link
http://tuoitre.vn/Chinh-tri-Xa-hoi/4...-cay-canh.html
and the link
http://tuoitre.vn/Chinh-tri-Xa-hoi/403734/Kiem-lam-va-cong-an-"canh-giu"-doan-xe-tai-cho-cay-canh.html
both worked in my browser (Chrome), and that the script worked fine irrespective of the code you suggested. It seems that the problem solved itself (hopefully for good). I honestly don't know how it happened but thanks a lot for your help anyway. I'll still keep your code in the script, just in case.
@Mike L: thanks for your suggestion as well but I have very little knowledge about python so I just don't know how to use ".
Quote:
Originally Posted by TonytheBookworm
Maybe Kovid or Starson or someone else will chime in and answer this for you and I. I don't see why the below doesn't work but that's not saying it does either.
Basically in the above it SHOULD look for all anchor tags (links) in your soup and then do a regexpression lookup for all instances of " insider the href reference. If it find it replace that value with %22 which is html for a double quote. Again this may not work but I didn't really have anything to test it on other than your code but the code didn't generate any links that had " in it so I wasn't really able to test it. Give a shot and see what happens for you.
|