Thread: Unicode issues
View Single Post
Old 07-20-2014, 03:26 PM   #3
DaltonST
Deviser
DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.
 
DaltonST's Avatar
 
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
Re: Unicode Issues, Comparing Strings, ISO Encoding, SQL and Other Maladies

After having a multitude of unicode issues dealing with comparison of strings between utf-8 and iso8859-15, and generally with creating sql statements using variables whose values originated in metadata.db and were causing runtime failures, I stumbled upon this little gem which I would like to share with the forum. I have not seen this syntax in any other documentation anywhere. See: http://stackoverflow.com/questions/2...15-with-python

>>> a = 'ü'
>>> a.decode('utf8') # terminal is configured to use UTF-8 by default
u'\xfc'
>>> a.decode('utf8').encode('iso8859-15')
'\xfc'

So, the secret to keep Python 2 from "covertly" decoding to ascii before it re-encodes (or tries) to iso8859-15 (and hence losing all the non-ascii characters in the process, such as those in 'não-ficção') is to use this syntax:

>>>>>> a.decode('utf8').encode('iso8859-15') <<<<<<<<<<
DaltonST is offline   Reply With Quote