Thread: web2lrf
View Single Post
Old 11-28-2007, 01:56 AM   #75
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,908
Karma: 5035037
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The problem with wired is that the files are encoded in UTF8 but they specify the encoding as iso8859-1. You can try either
1) Contact wired
2) write a preprocess regexp that changes the specified encoding
Code:
(r'<meta http-equiv="Content-Type" content="text/html; charset=(\S+)"',
 lambda match : match.group().replace(match.group(1), 'UTF-8'))
kovidgoyal is offline   Reply With Quote