- name = txtdata[offset:offset+ilen]
+ name = unicode(txtdata[offset:offset+ilen], 'windows-1252').encode('utf-8')
I do not think CTOC in index sections are always encoded as windows-1252.
I think the mobi header gives the proper encoding. If so, we will need to pass the encoding from mobi_unpack into the mobi_ncx and mobi_opf and convert from bytestring in the specified encoding to utf-8 bytestring.
Originally Posted by nleblanc88
I'd like to contribute v060 if I could. What this version fixes:
Encoding chapter names in UTF-8. This fixes NCX and OPF files from being encoded in non UTF-8 encodings.
From my test, chapter names with UTF-8 characters were not being written properly to the resulting .NCX file. This causes the file charset to be "unknown-8bit", and trying to parse these files would result in errors.
This patch fixes this issue. I've attached the source.
I'd also like to bring up the idea of setting up a git repository for this project(bitbucket.com or github.com). I'd love to keep contributing to this project, and I think this would not only make it easier for me and others to do so, but also help the author keep track of all versions. I'd be willing to set this up if anybody would like.