MobileRead Forums - View Single Post

jackie_w · 05-25-2018, 02:05 PM

I'm in the process of creating a User Interface plugin as a calibre version of Doitsu's new Sigil plugin (convert selected epub text files to MP3 using the MS Windows Speech API and the LAME encoder).

I'm still doing initial testing so am still running my .py scripts via calibre-debug in a Windows .bat file rather than via plugin. Consequently, I don't have access to the calibre library metadata ATMO so am trying to extract the few items I want from the container.mi object.

My problem is in understanding whether a container.mi field contains unicode or something else. For example this print statement

Code:

print('authors:', container.mi.authors, '\ntitle:', container.mi.title)

results in this in my CMD box

Code:

authors: [u'Yrsa Sigur\xf0ard\xf3ttir']
title: A ‘unicode’ title Yrsa Sigurðardóttir

'title' looks like a unicode string, but 'authors' looks like a list of non-unicode items.

Please can you advise how I should be accessing the container.mi data to make sure I always end up with unicode?

I could do what the Sigil plugin does and extract the data directly from container.opf but I'm sure container.mi has already done a far more robust job of that than anything I could come up with.

The metadata will be used to populate the MP3 tags.

I can attach a small test epub if necessary.

05-25-2018, 02:05 PM	#1
jackie_w Grand Sorcerer Posts: 6,212 Karma: 16534894 Join Date: Sep 2009 Location: UK Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3	container.mi and unicode I'm in the process of creating a User Interface plugin as a calibre version of Doitsu's new Sigil plugin (convert selected epub text files to MP3 using the MS Windows Speech API and the LAME encoder). I'm still doing initial testing so am still running my .py scripts via calibre-debug in a Windows .bat file rather than via plugin. Consequently, I don't have access to the calibre library metadata ATMO so am trying to extract the few items I want from the container.mi object. My problem is in understanding whether a container.mi field contains unicode or something else. For example this print statement Code: print('authors:', container.mi.authors, '\ntitle:', container.mi.title) results in this in my CMD box Code: authors: [u'Yrsa Sigur\xf0ard\xf3ttir'] title: A ‘unicode’ title Yrsa Sigurðardóttir 'title' looks like a unicode string, but 'authors' looks like a list of non-unicode items. Please can you advise how I should be accessing the container.mi data to make sure I always end up with unicode? I could do what the Sigil plugin does and extract the data directly from container.opf but I'm sure container.mi has already done a far more robust job of that than anything I could come up with. The metadata will be used to populate the MP3 tags. I can attach a small test epub if necessary. Last edited by jackie_w; 05-25-2018 at 02:12 PM. Reason: better example