10-17-2013, 08:09 PM | #1 |
Addict
Posts: 265
Karma: 724240
Join Date: Aug 2013
Device: KyBook
|
how to handle unicode chars in filenames in python?
Hi,
well, maybe not exactly unicode, more like accented characters, so utf-8 at a minimum I'm hitting the a wall, I need to find a function that converts this Götterdämmerung into this G\xf6tterd\xe4mmerung I've been all over the source but cannot find how it's done in python.... Last edited by At_Libitum; 10-17-2013 at 08:21 PM. |
10-17-2013, 10:32 PM | #2 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Trying to put non ascii characters into filenames robustly is impossible, there is a reason calibre itself convert all filenames to ascii. Different kernels and different filesystems encode filenames in different ways. And on some filesystems, these ways can depend system settings.
That said if you are doing it in a limited context and know what encoding to use then just do filename.encode('utf-8') assuming filename is a unicode object and not already a bytestring. If it is alreadya byte string then youneed to know what encoding it is in and decode it first like this filename.decode(encoding).encode('utf-8') |
Advert | |
|
10-18-2013, 08:54 AM | #3 |
Addict
Posts: 265
Karma: 724240
Join Date: Aug 2013
Device: KyBook
|
Sadly I cannot avoid using them because the reader uses the author-title combo as actual filenames.
.encode('utf-8') gave me the same as went in, so left the ö and such as-is and turns out that format(json.dumps(<name>)) does kinda what I need but it also inserts a \x00 byte which I do NOT need Last edited by At_Libitum; 10-18-2013 at 08:58 AM. |
10-18-2013, 09:18 AM | #4 | |
The Grand Mouse 高貴的老鼠
Posts: 71,506
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
0xF6 for ö and 0xE4 for ä looks like iso-8859-1 (ISO Latin-1). In which case, going from a unicode string, you want .encode("iso-8859-1") |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
ePub CSS @fontface Unicode chars render in <td>, not in <div>, other elements | Abelinkin | ePub | 2 | 06-05-2012 04:24 AM |
Filenames to metadata, preserving filenames. | nitrogun | Calibre | 5 | 09-13-2010 10:50 PM |
Help a beginner:Python/Recipe Unicode and ASCII | Starson17 | Calibre | 2 | 02-15-2010 11:10 AM |
unicode chars in epubs after flashing | hakim | Sony Reader | 4 | 10-12-2009 08:33 AM |
Python Unicode Demystified | ahi | Workshop | 2 | 09-18-2009 12:45 PM |