View Single Post
Old 09-16-2011, 08:40 AM   #17
dave9000
Junior Member
dave9000 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Apr 2010
Location: Italy
Device: SONY PRS-650, IREX ILIAD
There's a small problem with character encoding (windows-1252).
E.g. this sentence:
"... uno dei più famosi scrittori ..."
should be
"... uno dei più famosi scrittori ..."

It looks like the original text is encoded TWICE with UTF-8.
Thanks for the useful plugin anyway!

P.S. the following patch in worker.py seems to fix the issue:
OLD: raw = raw.decode('windows-1252', errors='replace')
NEW: raw = raw.decode('utf-8', errors='replace')

Last edited by dave9000; 09-16-2011 at 01:27 PM.
dave9000 is offline   Reply With Quote