MobileRead Forums - View Single Post - Help a beginner:Python/Recipe Unicode and ASCII

Starson17 · 02-14-2010, 11:09 AM

I'm writing a recipe and am having trouble with unicode formatting. I want to create an image tag (imgtag) using Beautiful Soup, and I've pulled a non-image tag (<photopath>) from another page that has the url I need as its contents, but it's stored as unicode u'http://....jpg'.

I've converted the contents of imgtag to a string using str() to define the src="http://....jpg" part of the imgtag, but that converted string is in the form: [u'http://....jpg'].

When I use that string as the src of the img tag, I get:
src="[u'http://....jpg']" instead of src="http://....jpg"

I'm reduced to taking a slice of the string [3:-2] to rip off the "[u'" part at the front, and the "']" at the end. Clearly, I'm bad at strings, but can anyone tell me how to handle this better?

I'm starting with a BeautifulSoup tag <photopath> from
soup2.find('photopath') that has the http://...jpg string as its NavigableString contents and I want to use that string as the imgsrc string in a Tag as:

imgtag = Tag(soup, 'img', [('src', imgsrc)])

without resorting to converting to a str and taking a slice to chop part of it out. The content is already a string, so surely there is an easier/cleaner way to do that. What am I missing?

Thanks.

02-14-2010, 11:09 AM	#1
Starson17 Wizard Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T	Help a beginner:Python/Recipe Unicode and ASCII I'm writing a recipe and am having trouble with unicode formatting. I want to create an image tag (imgtag) using Beautiful Soup, and I've pulled a non-image tag (<photopath>) from another page that has the url I need as its contents, but it's stored as unicode u'http://....jpg'. I've converted the contents of imgtag to a string using str() to define the src="http://....jpg" part of the imgtag, but that converted string is in the form: [u'http://....jpg']. When I use that string as the src of the img tag, I get: src="[u'http://....jpg']" instead of src="http://....jpg" I'm reduced to taking a slice of the string [3:-2] to rip off the "[u'" part at the front, and the "']" at the end. Clearly, I'm bad at strings, but can anyone tell me how to handle this better? I'm starting with a BeautifulSoup tag <photopath> from soup2.find('photopath') that has the http://...jpg string as its NavigableString contents and I want to use that string as the imgsrc string in a Tag as: imgtag = Tag(soup, 'img', [('src', imgsrc)]) without resorting to converting to a str and taking a slice to chop part of it out. The content is already a string, so surely there is an easier/cleaner way to do that. What am I missing? Thanks.