Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 10-17-2013, 08:09 PM   #1
At_Libitum
Addict
At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.
 
Posts: 266
Karma: 724240
Join Date: Aug 2013
Device: KyBook
how to handle unicode chars in filenames in python?

Hi,

well, maybe not exactly unicode, more like accented characters, so utf-8 at a minimum

I'm hitting the a wall, I need to find a function that converts this

Götterdämmerung

into this

G\xf6tterd\xe4mmerung

I've been all over the source but cannot find how it's done in python....

Last edited by At_Libitum; 10-17-2013 at 08:21 PM.
At_Libitum is offline   Reply With Quote
Old 10-17-2013, 10:32 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,762
Karma: 4998511
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Trying to put non ascii characters into filenames robustly is impossible, there is a reason calibre itself convert all filenames to ascii. Different kernels and different filesystems encode filenames in different ways. And on some filesystems, these ways can depend system settings.

That said if you are doing it in a limited context and know what encoding to use then just do

filename.encode('utf-8')

assuming filename is a unicode object and not already a bytestring. If it is alreadya byte string then youneed to know what encoding it is in and decode it first like this

filename.decode(encoding).encode('utf-8')
kovidgoyal is online now   Reply With Quote
Old 10-18-2013, 08:54 AM   #3
At_Libitum
Addict
At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.
 
Posts: 266
Karma: 724240
Join Date: Aug 2013
Device: KyBook
Sadly I cannot avoid using them because the reader uses the author-title combo as actual filenames.

.encode('utf-8') gave me the same as went in, so left the ö and such as-is

and turns out that format(json.dumps(<name>)) does kinda what I need but it also inserts a \x00 byte which I do NOT need

Last edited by At_Libitum; 10-18-2013 at 08:58 AM.
At_Libitum is offline   Reply With Quote
Old 10-18-2013, 09:18 AM   #4
pdurrant
The Grand Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 31,313
Karma: 86264204
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
Quote:
Originally Posted by At_Libitum View Post
Hi,

well, maybe not exactly unicode, more like accented characters, so utf-8 at a minimum

I'm hitting the a wall, I need to find a function that converts this

Götterdämmerung

into this

G\xf6tterd\xe4mmerung

I've been all over the source but cannot find how it's done in python....

0xF6 for ö and 0xE4 for ä looks like iso-8859-1 (ISO Latin-1). In which case, going from a unicode string, you want

.encode("iso-8859-1")
pdurrant is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ePub CSS @fontface Unicode chars render in <td>, not in <div>, other elements Abelinkin ePub 2 06-05-2012 04:24 AM
Filenames to metadata, preserving filenames. nitrogun Calibre 5 09-13-2010 10:50 PM
Help a beginner:Python/Recipe Unicode and ASCII Starson17 Calibre 2 02-15-2010 11:10 AM
unicode chars in epubs after flashing hakim Sony Reader 4 10-12-2009 08:33 AM
Python Unicode Demystified ahi Workshop 2 09-18-2009 12:45 PM


All times are GMT -4. The time now is 03:25 AM.


MobileRead.com is a privately owned, operated and funded community.