Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-26-2011, 05:26 PM   #1
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 519
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
Problem with Unicode masthead text

I am working on a new recipe for the newspaper La Razón, Bolivia, as the old one doesn't work anymore, while the website changed. I have to new recipe working mostly, it just needs some esthetic changes. I will post it when it is finished.

The website of the paper doesn't have a suitable masthead image anymore so I used
Code:
    def get_masthead_title(self):
        return u'La Razón'
However the masthead image generator uses the title utf-8 encoded which gives strange characters instead of the 'ó'.

The PIL draw text method doesn't expect a utf-8 encoded byte string but it accepts a normal Unicode string. So I changed generate_masthead in calibre/ebooks/__init__.py to eliminate the conversion to utf-8 and that solved the problem. Essentially the text = title.encode('utf-8') should be eliminated and title be used instead of text.


Here is the diff:
Code:
diff -u /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py.\~1\~ /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py
--- /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py.~1~    2011-12-26 18:40:30.000000000 +0100
+++ /Applications/calibre.app/Contents/Resources/Python/site-packages/calibre/ebooks/__init__.py    2011-12-26 23:03:10.000000000 +0100
@@ -240,11 +240,10 @@
         font = ImageFont.truetype(font_path, 48)
     except:
         font = ImageFont.truetype(default_font, 48)
-    text = title.encode('utf-8')
-    width, height = draw.textsize(text, font=font)
+    width, height = draw.textsize(title, font=font)
     left = max(int((width - width)/2.), 0)
     top = max(int((height - height)/2.), 0)
-    draw.text((left, top), text, fill=(0,0,0), font=font)
+    draw.text((left, top), title, fill=(0,0,0), font=font)
     if output_path is None:
         f = StringIO()
         img.save(f, 'JPEG')
pietvo is offline   Reply With Quote
Old 12-26-2011, 05:51 PM   #2
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
You could let me know recipe needed update since I wrote the original one.
kiklop74 is offline   Reply With Quote
Advert
Old 12-26-2011, 08:15 PM   #3
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Updated recipe will be included in the next release of Calibre

https://bugs.launchpad.net/calibre/+bug/908912
kiklop74 is offline   Reply With Quote
Old 12-27-2011, 03:52 AM   #4
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 519
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
Quote:
Originally Posted by kiklop74 View Post
You could let me know recipe needed update since I wrote the original one.
You are right, but I hadn't noticed your email address. I only noticed it after I finished the recipe. Anyway, this was my first recipe so it was a good exercise for me. Thanks for supplying a new one.

However, this topic is about the problem with Unicode masthead titles. Shall I make a bug report for it?

By the way, I spent a very nice holiday in Argentina last year: 4 days in BA and 4 days in Iguazu, on the road to Bolivia.
pietvo is offline   Reply With Quote
Old 12-27-2011, 04:31 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
IIRC PIL requires UTF-8 encoded text. There's probably some double encoding going on somewhere. Try this instead:

text = title.encode('utf-8') if isinstance(title, unicode) else title

And also use

u'La Raz\xc3\xb3n' instead of u'La Razón'

Last edited by kovidgoyal; 12-27-2011 at 04:34 AM.
kovidgoyal is offline   Reply With Quote
Advert
Old 12-27-2011, 10:10 AM   #6
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 519
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
Quote:
Originally Posted by kovidgoyal View Post
IIRC PIL requires UTF-8 encoded text. There's probably some double encoding going on somewhere.
Actually I tried it out with my patch. And it appears that PIL happily accepted the Unicode text and generated a proper image. Moreover, doing the utf-8 encoding as the current code does gives the wrong result. I think it is Qt that requires utf-8, not PIL.
The PIL documentation contains an example where Unicode text is given to draw:

Code:
 
font = ImageFont.truetype("symbol.ttf", 16, encoding="symb")     
draw.text((0, 0), unichr(0xF000 + 0xAA))
I think it should have an additional parameter font=font
Quote:
Originally Posted by kovidgoyal View Post
There's probably some double encoding going on somewhere. Try this instead:

text = title.encode('utf-8') if isinstance(title, unicode) else title

And also use

u'La Raz\xc3\xb3n' instead of u'La Razón'
That seems very wrong to me (forcing a double encoding yourself).
pietvo is offline   Reply With Quote
Old 12-27-2011, 10:34 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That code is in there for a reason. And u'La Raz\xc3\xb3n' is not a double encoding, it is an ascii representation to ensure your problem isn't coming from the python interpreter parsing the .py file incorrectly.
kovidgoyal is offline   Reply With Quote
Old 12-27-2011, 03:57 PM   #8
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 519
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
I don't know the reason the code is there. Maybe there used to be a reason which is no longer valid.

And u'La Raz\xc3\xb3n' is a double encoding, it is an not an ascii representation of u'La Razón'. That would be u'La Raz\xf3n', which is probably what you meant. What you have written is a utf-8 encoding of ó, and that put in an ascii representation in a Unicode string. Putting utf-8 bytes in a Unicode string is most of the times wrong. I I print that string it outputs La Razón, which is exactly the text I got in the masthead image, showing that the utf-8 encoding that the code does should not be done. And the parser didn't parse it incorrectly because I had a # -*- coding: utf-8 -*- line and saved the file in utf-8. To safeguard against source code problems indeed \x3f could be used but then the title.encode('utf-8') would still cause the wrong rendering.

Fredrik Lundh, the author of PIL also says that the text can be a Unicode string if the font you use supports Unicode. Here is an example the you can try to see that it works.

Quote:
# -*- coding: utf-8 -*-
import ImageFont, Image, ImageDraw
s = u'La Razón € ñ'
font = ImageFont.truetype('/System/Library/Fonts/LucidaGrande.ttc', 18, encoding='unic')
print font.getsize(s)
im = Image.new('RGB', (200,200))
draw = ImageDraw.Draw(im)
draw.text((40,40), s, font=font)
im.show()
pietvo is offline   Reply With Quote
Old 12-27-2011, 10:40 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Fix committed.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with masthead image kindle3reader Kindle Formats 0 02-03-2011 04:29 PM
Unicode characters OK in text but wrong in TOC paulpeer ePub 8 01-15-2010 06:17 PM
Using Unicode Fonts for darker text Damætas Kindle Developer's Corner 11 04-19-2009 03:44 PM
Converting non-ascii/non-unicode text - pictures the way to go? politicorific Workshop 5 04-02-2009 05:59 AM
Problem with preprocess_regexps and Unicode mccande Calibre 8 12-19-2008 09:26 AM


All times are GMT -4. The time now is 01:54 AM.


MobileRead.com is a privately owned, operated and funded community.