MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

DiapDealer · 03-24-2013, 09:24 AM

Quote:

Originally Posted by KevinH

Okay, I think xmlescape and HTMLparser both work better with full unicode strings. At that point, all metadata has already been encoded as utf-8, so I have modified mobi_opf.py to convert all required pieces from utf-8 to full unicode, pass through the xmlescape and escape methods, and then convert back to the needed utf-8 for the opf file.

Just a heads up: there are three more places in the mobi_opf script where data gets the unescape->escape treatment in addition to the handleTag and handleMetaPairs methods.

Would it make sense to do something similar (full unicode) in those additional three locations?