View Single Post
Old 01-13-2017, 10:37 AM   #31
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,121
Karma: 6404930
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by slowsmile View Post
... I'm guessing that they probably export as utf-8 on Linux and utf-16 on OSX but not really sure.
Mac OS X always uses utf-8 for terminals, paths, and etc unless the user has forced it to something else.

Quote:
When I researched UnicodeDammit I found that it would identify widows-1252, latin-1, ISO/IEC 8859-2 and utf-8 without much problems. And while researching UnicodeDammit from bs4 I found out that it also uses the chardet and cchardet modules as well as the codecs module in its routines.
And as I remember it will try and detect the charset meta info as well if it exists. Try reading the file in python3 as binary 'rb' and send the bytes to the UnicodeDammit routine and see if it will properly detect the encodings. It should.

Quote:
I would also completely agree with you about zip supporting utf-8 for file contents. But I was really talking about about zip file names. For zip file names I think you'll find that only DOS Latin US charset is allowed.
No there is a flag for file name info encoding as well in zip (again it can always be viewed as a sequence of bytes like on Linux). Using the internal python3 zip module should automatically handle all of this fwiw.

Take care,

KevinH
KevinH is offline   Reply With Quote