Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 08-14-2018, 01:08 PM   #1
Phssthpok
Age improves with wine.
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 558
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
Flattening unicode with unidecode() in a plugin

I want to have authors, titles etc. flattened to ASCII, and am trying to use unidecode() to do it. In IDLE, I can say:
Code:
from unidecode import unidecode
print(unidecode('Philip José Farmer'))
and the output is "Philip Jose Farmer" (the accented é has been replaced by an unaccented e).

I downloaded the unidecode module and inserted it into my plugin, and at first I got this:
Code:
SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xe9 in position 0: unexpected end of data
I re-saved the source code as a UTF-8 file, but then after that exactly the same code as above gives me "Philip Jos Farmer" (the accented é is deleted, rather than being replaced by e).

The only other notable difference is that my IDLE is using Python 3.7, whereas Calibre is using 2.7.

Can anyone tell me what's going on here? And is there anything already in Calibre to do the flattening, rather than having to include the whole unidecode module in my plugin?
Phssthpok is offline   Reply With Quote
Old 08-14-2018, 02:17 PM   #2
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,205
Karma: 16228558
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Quote:
Originally Posted by Phssthpok View Post
And is there anything already in Calibre to do the flattening, rather than having to include the whole unidecode module in my plugin?
Off the top of my head I can't remember exactly what each of the following does, but if you download a copy of the calibre source code these functions may be helpful:

from calibre.utils.filenames import ascii_text
from calibre import sanitize_file_name_unicode
from calibre.ebooks.oeb.polish.check.parsing import make_filename_safe

I've used all of them at some point in various calibre plugins.
jackie_w is offline   Reply With Quote
Advert
Old 08-14-2018, 09:53 PM   #3
Hopkins
Enthusiast
Hopkins began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Jun 2016
Location: Minnesota USA
Device: Amazon Paperwhite 3G
Python 2.7 has issues with unicode. This may not help, but try adding the following line to the top of your python code before any imports:

from __future__ import (unicode_literals, division, absolute_import, print_function)

Note: You may only need the unicode_literals part.
Hopkins is offline   Reply With Quote
Old 08-15-2018, 04:54 AM   #4
Phssthpok
Age improves with wine.
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 558
Karma: 95229
Join Date: Nov 2014
Device: Kindle Oasis, Kobo Libra II
Quote:
Originally Posted by jackie_w View Post
Off the top of my head I can't remember exactly what each of the following does, but if you download a copy of the calibre source code these functions may be helpful:

from calibre.utils.filenames import ascii_text
from calibre import sanitize_file_name_unicode
from calibre.ebooks.oeb.polish.check.parsing import make_filename_safe

I've used all of them at some point in various calibre plugins.
Many thanks, I'll look at those.

Meanwhile a bit of pottering around in unidecode() revealed the problem -- it uses __import__() to dynamically load a conversion module for each 256-char code page, and I just needed to change the module name to "calibre_plugins.foo.unidecode.X" instead of "unidecode.X". But I'd prefer to use something which is already present in Calibre if possible, so I'll check out your suggestions.
Phssthpok is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Plugin] QuickPrefsEdit - Edit plugin prefs json files. slowsmile Plugins 3 07-25-2018 08:14 PM
Goodread Perception Expander plugin not shown on plugin list (kobo h2o) www KOReader 4 09-28-2017 10:34 AM
The flattening of e-book sales Ken.Hagdal General Discussions 50 08-26-2013 04:09 PM
Flattening the TOC hierarchy ElMiko Calibre 3 10-18-2012 05:53 AM
Flattening the battery to recalibrate nimble Sony Reader 6 03-06-2010 09:28 PM


All times are GMT -4. The time now is 02:42 AM.


MobileRead.com is a privately owned, operated and funded community.