MobileRead Forums - View Single Post - 'utf8' codec can't decode byte 0xb1 in position 18: invalid start byte

KevinH · 11-15-2012, 12:30 PM

Hi Kovid,

To workaround these issues, ePubFixer uses its own zipfile.py version called zipfilerugged.py which is simply the official zipfile.py file with this one change to explicitly catch the decode problem when garbage central directory filenames are used with the encoded as utf-8 flag (and they are garbage chars, not utf-16 as far as I can tell).

Code:

    def _decodeFilename(self):
        if self.flag_bits & 0x800:
            try:
                return self.filename.decode('utf-8')
            except:
	        return self.filename
        else:
            return self.filename

This prevents zipfilerugged.py from barfing out when simply trying to open the bad zip.

Then to fix the cases where central and local filenames differ (again because of garbage chars in some central directory filenames), ePubFixer uses the following code that imports the zipfilerugged.py

(see attached zipfix.py)

Your new approach of reading the entire zip by processing the local information only should be more robust and closer to what B&N is using but ePubFixer works too.

Hope this helps.

KevinH