MobileRead Forums - View Single Post - Unicode & mi.set_user_metadata('#customxxx', custcol)

DaltonST · 07-23-2014, 10:50 AM

My native Python 2.7 IDLE accepts my utf-8 u'N\xe3o-fic\xe7\xe3o' when I copy it in, and then prints it correctly as Não-ficção on my screen.

I do not use any byte strings. Pure utf-8. I use temp tables in metadata.db, and they show the utf-8 unicode strings properly as Não-ficção on my pc display using a SQLite management application. So metadata.db has the pure utf-8 data, and I get it back from there to update it in mi.set_user_metadata('#customxxx', custcol).

According to https://docs.python.org/2/howto/unicode.html, "Under the hood, Python represents Unicode strings as either 16- or 32-bit integers, depending on how the Python interpreter was compiled". That doesn't mean that it does not support utf-8.

The same source also says: "UTF-8 is one of the most commonly used encodings. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit numbers are used in the encoding. (There’s also a UTF-16 encoding, but it’s less frequently used than UTF-8.) "

Calibre's personal copy of Python 2.7x apparently does not support utf-8, although SQLite does. Otherwise, metadata.db would not be updated correctly.

Kovid, thanks again for you help.

07-23-2014, 10:50 AM	#11
DaltonST Deviser Posts: 2,265 Karma: 2090983 Join Date: Aug 2013 Location: Texas Device: none	My native Python 2.7 IDLE accepts my utf-8 u'N\xe3o-fic\xe7\xe3o' when I copy it in, and then prints it correctly as Não-ficção on my screen. I do not use any byte strings. Pure utf-8. I use temp tables in metadata.db, and they show the utf-8 unicode strings properly as Não-ficção on my pc display using a SQLite management application. So metadata.db has the pure utf-8 data, and I get it back from there to update it in mi.set_user_metadata('#customxxx', custcol). According to https://docs.python.org/2/howto/unicode.html, "Under the hood, Python represents Unicode strings as either 16- or 32-bit integers, depending on how the Python interpreter was compiled". That doesn't mean that it does not support utf-8. The same source also says: "UTF-8 is one of the most commonly used encodings. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit numbers are used in the encoding. (There’s also a UTF-16 encoding, but it’s less frequently used than UTF-8.) " Calibre's personal copy of Python 2.7x apparently does not support utf-8, although SQLite does. Otherwise, metadata.db would not be updated correctly. Kovid, thanks again for you help.