Thanks for your testing. I will look at all of the issues you pointed out. But I am most interested in issues with encodings. This version should work better since utf-8 can encode all possible characters. Did you run from the command line or via the gui? The gui log window should show all characters correctly. Does it?
If running from the command-line on on Windows the best way to run the program is to change your codepage to cp65001 first. If you do that does it work?
Originally Posted by Sergey Dubinets
v0.61beta works well.
Here are some comments so far:
1. mobi_ncx.py:9 we don't need to import readTagSection, getVariableWidthValue to this module.
2. Program can print nice disagnostic. The problem is that it prints UTF-8 strings to console. This works only for
english text (at list on WIndows). When I debug Russian books I see less readable debug output.
3. escape/unescape in OPF. You recently added HTMLParser.unescape(). Are you sure that original values are
escaped? Unescaping on not escaped values would be a bug.
Using saxutils.escape() is correct for text nodes:
data.append('<%s>%s</%s>\n' % (tag, xmlescape(self.h.unescape(value)), closingTag))
And is not suficient
for attribute values:
data.append('<meta name="%s" content="%s" />\n' % (name, xmlescape(self.h.unescape(value))))
I later case you need also escape " as " and ' as '
I sugest you use quoteattr() for atributes instead of escape()
4. mobi_unpack.py:621 Why you don't use setsectiondescription() method? The same with 6 other ocations in the same
5. mobi_unpack.py:704 Redundant call. the same 696, 697, 698
6. mobi_unpack.py:905 method is never used
7. mobi_unpack.py:608 duplicate map entry