MobileRead Forums - View Single Post

DaltonST · 07-20-2014, 03:26 PM

Re: Unicode Issues, Comparing Strings, ISO Encoding, SQL and Other Maladies

After having a multitude of unicode issues dealing with comparison of strings between utf-8 and iso8859-15, and generally with creating sql statements using variables whose values originated in metadata.db and were causing runtime failures, I stumbled upon this little gem which I would like to share with the forum. I have not seen this syntax in any other documentation anywhere. See: http://stackoverflow.com/questions/2...15-with-python

>>> a = 'ü'
>>> a.decode('utf8') # terminal is configured to use UTF-8 by default
u'\xfc'
>>> a.decode('utf8').encode('iso8859-15')
'\xfc'

So, the secret to keep Python 2 from "covertly" decoding to ascii before it re-encodes (or tries) to iso8859-15 (and hence losing all the non-ascii characters in the process, such as those in 'não-ficção') is to use this syntax:

>>>>>> a.decode('utf8').encode('iso8859-15') <<<<<<<<<<

07-20-2014, 03:26 PM	#3
DaltonST Deviser Posts: 2,265 Karma: 2090983 Join Date: Aug 2013 Location: Texas Device: none	Re: Unicode Issues, Comparing Strings, ISO Encoding, SQL and Other Maladies After having a multitude of unicode issues dealing with comparison of strings between utf-8 and iso8859-15, and generally with creating sql statements using variables whose values originated in metadata.db and were causing runtime failures, I stumbled upon this little gem which I would like to share with the forum. I have not seen this syntax in any other documentation anywhere. See: http://stackoverflow.com/questions/2...15-with-python >>> a = 'ü' >>> a.decode('utf8') # terminal is configured to use UTF-8 by default u'\xfc' >>> a.decode('utf8').encode('iso8859-15') '\xfc' So, the secret to keep Python 2 from "covertly" decoding to ascii before it re-encodes (or tries) to iso8859-15 (and hence losing all the non-ascii characters in the process, such as those in 'não-ficção') is to use this syntax: >>>>>> a.decode('utf8').encode('iso8859-15') <<<<<<<<<<