Thread: web2lrf
View Single Post
Old 11-28-2007, 02:27 AM   #76
DaveNB
Connoisseur
DaveNB has a complete set of Star Wars action figures.DaveNB has a complete set of Star Wars action figures.DaveNB has a complete set of Star Wars action figures.DaveNB has a complete set of Star Wars action figures.
 
Posts: 86
Karma: 399
Join Date: Jun 2007
Device: Nook, Sony PRS-500, Nokia 770 (FBReader)
Quote:
Originally Posted by kovidgoyal View Post
The problem with wired is that the files are encoded in UTF8 but they specify the encoding as iso8859-1. You can try either
1) Contact wired
2) write a preprocess regexp that changes the specified encoding
Code:
(r'<meta http-equiv="Content-Type" content="text/html; charset=(\S+)"',
 lambda match : match.group().replace(match.group(1), 'UTF-8'))
I see, I tried changing the wired.py to specify a iso8859-1 encoding, but this didn't fix the problem, the apostrophes are still funny...will keep hacking at it. Also tried searching for the exact hex sequence that is causing trouble and replacing it with a normal apostrophe without success:

(r'\xE2\x80\x99', lambda match: "'"),



Any ideas?

Dave

Last edited by DaveNB; 11-28-2007 at 03:25 AM.
DaveNB is offline   Reply With Quote