Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-14-2012, 02:12 PM   #1
atordo
Connoisseur
atordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to behold
 
Posts: 89
Karma: 19669
Join Date: Apr 2012
Device: Kindle Touch
Can't match Unicode character

In the good old days when men were men and characters were single bytes, humble ASCII had just two quotes: single (') and double ("). Now we have opening and closing, single and double, low and up, with and without angle, normal and heavy, reversed or not, and their respective combinations. Some of them make recipes fail with the following error:
Code:
UnicodeEncodeError: 'charmap' codec can't encode character u'\u201c' in position 30: character maps to <undefined>
The solution seems easy, doesn't it? Just replace the char in question with some HTML entity. So blatantly plagiarizing Danas recipe by Darko Miletic (I must thank Darko and other recipe authors, because I do this all the time), I add the following in preprocess_regexps:
Code:
,(re.compile(u'\u201c'), lambda match: '&ldquo;') # left double quotation mark
Among others because there are several chars which produce the error, but one is enough as example. Problem is the character is never matched. I've tried other forms like '\xe2\x80\x9c' (the three bytes which compose the char in UTF-8), pasting the char directly in the recipe, etc, to no avail. They pass untouched, as can be verified by inspecting the HTML files when running ebook-convert in the command line.

I attach my failed recipe (as a zip to preserve UTF-8) in the hope that somebody can solve it. TIA.
Attached Files
File Type: zip rt-recipe.zip (1.4 KB, 204 views)
atordo is offline   Reply With Quote
Old 06-14-2012, 03:27 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,279
Karma: 27111060
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Try setting encoding to different values in your recipe. utf8, latin1, cp1252, cp1251 are popular.
kovidgoyal is offline   Reply With Quote
Advert
Old 06-15-2012, 03:20 PM   #3
atordo
Connoisseur
atordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to beholdatordo is a splendid one to behold
 
Posts: 89
Karma: 19669
Join Date: Apr 2012
Device: Kindle Touch
Thanks for the suggestion, but the page really uses UTF-8. Setting the encoding to other values just adds garbage chars in the text.

I'm afraid this may require more complex solutions. I'll have a look at builtin recipes for more inspiration and report back when/if I make any progress.
atordo is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with Unicode Character 'Word Joiner' (U+2060) psztk Conversion 0 10-14-2011 01:18 PM
how to have regex dot match any character including newline? gnychis Calibre 5 11-30-2010 06:35 PM
Glyph Substitution of Unicode character vdevan OpenInkpot 2 07-18-2009 05:54 PM
eReader to match Amazon... more is always better! Ceili News 18 07-01-2009 11:11 AM
SonyStyle Price Match Zen-Diego Sony Reader 3 05-06-2009 03:07 PM


All times are GMT -4. The time now is 01:09 PM.


MobileRead.com is a privately owned, operated and funded community.