http://docs.python.org/library/codec...dard-encodings
That looks like your HTML file has a mix of encodings. Some program in the past converted a single byte encoding to UTF-16 by blindly copying the single byte into the lower UTF-16 byte.
I'd guess the encoding for your file is cp1252 once you strip out all 0 bytes.