|
|
#1 |
|
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
Problem with Unicode masthead text
I am working on a new recipe for the newspaper La Razón, Bolivia, as the old one doesn't work anymore, while the website changed. I have to new recipe working mostly, it just needs some esthetic changes. I will post it when it is finished.
The website of the paper doesn't have a suitable masthead image anymore so I used Code:
def get_masthead_title(self):
return u'La Razón'
The PIL draw text method doesn't expect a utf-8 encoded byte string but it accepts a normal Unicode string. So I changed generate_masthead in calibre/ebooks/__init__.py to eliminate the conversion to utf-8 and that solved the problem. Essentially the text = title.encode('utf-8') should be eliminated and title be used instead of text. Here is the diff: Code:
diff -u /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py.\~1\~ /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py
--- /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py.~1~ 2011-12-26 18:40:30.000000000 +0100
+++ /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py 2011-12-26 23:03:10.000000000 +0100
@@ -240,11 +240,10 @@
font = ImageFont.truetype(font_path, 48)
except:
font = ImageFont.truetype(default_font, 48)
- text = title.encode('utf-8')
- width, height = draw.textsize(text, font=font)
+ width, height = draw.textsize(title, font=font)
left = max(int((width - width)/2.), 0)
top = max(int((height - height)/2.), 0)
- draw.text((left, top), text, fill=(0,0,0), font=font)
+ draw.text((left, top), title, fill=(0,0,0), font=font)
if output_path is None:
f = StringIO()
img.save(f, 'JPEG')
|
|
|
|
|
|
#2 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
You could let me know recipe needed update since I wrote the original one.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Updated recipe will be included in the next release of Calibre
https://bugs.launchpad.net/calibre/+bug/908912 |
|
|
|
|
|
#4 | |
|
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
Quote:
However, this topic is about the problem with Unicode masthead titles. Shall I make a bug report for it? By the way, I spent a very nice holiday in Argentina last year: 4 days in BA and 4 days in Iguazu, on the road to Bolivia. |
|
|
|
|
|
|
#5 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
IIRC PIL requires UTF-8 encoded text. There's probably some double encoding going on somewhere. Try this instead:
text = title.encode('utf-8') if isinstance(title, unicode) else title And also use u'La Raz\xc3\xb3n' instead of u'La Razón' Last edited by kovidgoyal; 12-27-2011 at 05:34 AM. |
|
|
|
| Advert | |
|
|
|
|
#6 | |
|
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
Quote:
The PIL documentation contains an example where Unicode text is given to draw: Code:
font = ImageFont.truetype("symbol.ttf", 16, encoding="symb")
draw.text((0, 0), unichr(0xF000 + 0xAA))
|
|
|
|
|
|
|
#7 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That code is in there for a reason. And u'La Raz\xc3\xb3n' is not a double encoding, it is an ascii representation to ensure your problem isn't coming from the python interpreter parsing the .py file incorrectly.
|
|
|
|
|
|
#8 | |
|
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
I don't know the reason the code is there. Maybe there used to be a reason which is no longer valid.
And u'La Raz\xc3\xb3n' is a double encoding, it is an not an ascii representation of u'La Razón'. That would be u'La Raz\xf3n', which is probably what you meant. What you have written is a utf-8 encoding of ó, and that put in an ascii representation in a Unicode string. Putting utf-8 bytes in a Unicode string is most of the times wrong. I I print that string it outputs La Razón, which is exactly the text I got in the masthead image, showing that the utf-8 encoding that the code does should not be done. And the parser didn't parse it incorrectly because I had a # -*- coding: utf-8 -*- line and saved the file in utf-8. To safeguard against source code problems indeed \x3f could be used but then the title.encode('utf-8') would still cause the wrong rendering. Fredrik Lundh, the author of PIL also says that the text can be a Unicode string if the font you use supports Unicode. Here is an example the you can try to see that it works. Quote:
|
|
|
|
|
|
|
#9 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Fix committed.
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Problem with masthead image | kindle3reader | Kindle Formats | 0 | 02-03-2011 05:29 PM |
| Unicode characters OK in text but wrong in TOC | paulpeer | ePub | 8 | 01-15-2010 07:17 PM |
| Using Unicode Fonts for darker text | Damætas | Kindle Developer's Corner | 11 | 04-19-2009 04:44 PM |
| Converting non-ascii/non-unicode text - pictures the way to go? | politicorific | Workshop | 5 | 04-02-2009 06:59 AM |
| Problem with preprocess_regexps and Unicode | mccande | Calibre | 8 | 12-19-2008 10:26 AM |