View Single Post
Old 09-30-2010, 04:26 PM   #5
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Maybe Kovid or Starson or someone else will chime in and answer this for you and I. I don't see why the below doesn't work but that's not saying it does either.
Spoiler:

Code:
def preprocess_html(self, soup):
	 for a in soup.findAll('a'):
	 
	  a['href'] = a['href'].replace(r'(")', "%22")
	  
	 return soup


Basically in the above it SHOULD look for all anchor tags (links) in your soup and then do a regexpression lookup for all instances of " insider the href reference. If it find it replace that value with %22 which is html for a double quote. Again this may not work but I didn't really have anything to test it on other than your code but the code didn't generate any links that had " in it so I wasn't really able to test it. Give a shot and see what happens for you.
TonytheBookworm is offline   Reply With Quote