![]() |
#1 |
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
Problem with Unicode masthead text
I am working on a new recipe for the newspaper La Razón, Bolivia, as the old one doesn't work anymore, while the website changed. I have to new recipe working mostly, it just needs some esthetic changes. I will post it when it is finished.
The website of the paper doesn't have a suitable masthead image anymore so I used Code:
def get_masthead_title(self): return u'La Razón' The PIL draw text method doesn't expect a utf-8 encoded byte string but it accepts a normal Unicode string. So I changed generate_masthead in calibre/ebooks/__init__.py to eliminate the conversion to utf-8 and that solved the problem. Essentially the text = title.encode('utf-8') should be eliminated and title be used instead of text. Here is the diff: Code:
diff -u /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py.\~1\~ /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py --- /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py.~1~ 2011-12-26 18:40:30.000000000 +0100 +++ /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py 2011-12-26 23:03:10.000000000 +0100 @@ -240,11 +240,10 @@ font = ImageFont.truetype(font_path, 48) except: font = ImageFont.truetype(default_font, 48) - text = title.encode('utf-8') - width, height = draw.textsize(text, font=font) + width, height = draw.textsize(title, font=font) left = max(int((width - width)/2.), 0) top = max(int((height - height)/2.), 0) - draw.text((left, top), text, fill=(0,0,0), font=font) + draw.text((left, top), title, fill=(0,0,0), font=font) if output_path is None: f = StringIO() img.save(f, 'JPEG') |
![]() |
![]() |
![]() |
#2 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
You could let me know recipe needed update since I wrote the original one.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Updated recipe will be included in the next release of Calibre
https://bugs.launchpad.net/calibre/+bug/908912 |
![]() |
![]() |
![]() |
#4 | |
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
Quote:
However, this topic is about the problem with Unicode masthead titles. Shall I make a bug report for it? By the way, I spent a very nice holiday in Argentina last year: 4 days in BA and 4 days in Iguazu, on the road to Bolivia. |
|
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,195
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
IIRC PIL requires UTF-8 encoded text. There's probably some double encoding going on somewhere. Try this instead:
text = title.encode('utf-8') if isinstance(title, unicode) else title And also use u'La Raz\xc3\xb3n' instead of u'La Razón' Last edited by kovidgoyal; 12-27-2011 at 04:34 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
Quote:
The PIL documentation contains an example where Unicode text is given to draw: Code:
font = ImageFont.truetype("symbol.ttf", 16, encoding="symb") draw.text((0, 0), unichr(0xF000 + 0xAA)) |
|
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,195
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That code is in there for a reason. And u'La Raz\xc3\xb3n' is not a double encoding, it is an ascii representation to ensure your problem isn't coming from the python interpreter parsing the .py file incorrectly.
|
![]() |
![]() |
![]() |
#8 | |
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
I don't know the reason the code is there. Maybe there used to be a reason which is no longer valid.
And u'La Raz\xc3\xb3n' is a double encoding, it is an not an ascii representation of u'La Razón'. That would be u'La Raz\xf3n', which is probably what you meant. What you have written is a utf-8 encoding of ó, and that put in an ascii representation in a Unicode string. Putting utf-8 bytes in a Unicode string is most of the times wrong. I I print that string it outputs La Razón, which is exactly the text I got in the masthead image, showing that the utf-8 encoding that the code does should not be done. And the parser didn't parse it incorrectly because I had a # -*- coding: utf-8 -*- line and saved the file in utf-8. To safeguard against source code problems indeed \x3f could be used but then the title.encode('utf-8') would still cause the wrong rendering. Fredrik Lundh, the author of PIL also says that the text can be a Unicode string if the font you use supports Unicode. Here is an example the you can try to see that it works. Quote:
|
|
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,195
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Fix committed.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem with masthead image | kindle3reader | Kindle Formats | 0 | 02-03-2011 04:29 PM |
Unicode characters OK in text but wrong in TOC | paulpeer | ePub | 8 | 01-15-2010 06:17 PM |
Using Unicode Fonts for darker text | Damætas | Kindle Developer's Corner | 11 | 04-19-2009 03:44 PM |
Converting non-ascii/non-unicode text - pictures the way to go? | politicorific | Workshop | 5 | 04-02-2009 05:59 AM |
Problem with preprocess_regexps and Unicode | mccande | Calibre | 8 | 12-19-2008 09:26 AM |