Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 10-17-2013, 08:09 PM   #1
At_Libitum
Addict
At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.
 
Posts: 265
Karma: 724240
Join Date: Aug 2013
Device: KyBook
how to handle unicode chars in filenames in python?

Hi,

well, maybe not exactly unicode, more like accented characters, so utf-8 at a minimum

I'm hitting the a wall, I need to find a function that converts this

Götterdämmerung

into this

G\xf6tterd\xe4mmerung

I've been all over the source but cannot find how it's done in python....

Last edited by At_Libitum; 10-17-2013 at 08:21 PM.
At_Libitum is offline   Reply With Quote
Old 10-17-2013, 10:32 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Trying to put non ascii characters into filenames robustly is impossible, there is a reason calibre itself convert all filenames to ascii. Different kernels and different filesystems encode filenames in different ways. And on some filesystems, these ways can depend system settings.

That said if you are doing it in a limited context and know what encoding to use then just do

filename.encode('utf-8')

assuming filename is a unicode object and not already a bytestring. If it is alreadya byte string then youneed to know what encoding it is in and decode it first like this

filename.decode(encoding).encode('utf-8')
kovidgoyal is offline   Reply With Quote
Advert
Old 10-18-2013, 08:54 AM   #3
At_Libitum
Addict
At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.At_Libitum ought to be getting tired of karma fortunes by now.
 
Posts: 265
Karma: 724240
Join Date: Aug 2013
Device: KyBook
Sadly I cannot avoid using them because the reader uses the author-title combo as actual filenames.

.encode('utf-8') gave me the same as went in, so left the ö and such as-is

and turns out that format(json.dumps(<name>)) does kinda what I need but it also inserts a \x00 byte which I do NOT need

Last edited by At_Libitum; 10-18-2013 at 08:58 AM.
At_Libitum is offline   Reply With Quote
Old 10-18-2013, 09:18 AM   #4
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,506
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by At_Libitum View Post
Hi,

well, maybe not exactly unicode, more like accented characters, so utf-8 at a minimum

I'm hitting the a wall, I need to find a function that converts this

Götterdämmerung

into this

G\xf6tterd\xe4mmerung

I've been all over the source but cannot find how it's done in python....

0xF6 for ö and 0xE4 for ä looks like iso-8859-1 (ISO Latin-1). In which case, going from a unicode string, you want

.encode("iso-8859-1")
pdurrant is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ePub CSS @fontface Unicode chars render in <td>, not in <div>, other elements Abelinkin ePub 2 06-05-2012 04:24 AM
Filenames to metadata, preserving filenames. nitrogun Calibre 5 09-13-2010 10:50 PM
Help a beginner:Python/Recipe Unicode and ASCII Starson17 Calibre 2 02-15-2010 11:10 AM
unicode chars in epubs after flashing hakim Sony Reader 4 10-12-2009 08:33 AM
Python Unicode Demystified ahi Workshop 2 09-18-2009 12:45 PM


All times are GMT -4. The time now is 01:31 AM.


MobileRead.com is a privately owned, operated and funded community.