View Single Post
Old 12-31-2012, 10:49 AM   #464
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi Sergey,

> 1. mobi_ncx.py:9 we don't need to import readTagSection, getVariableWidthValue to this module.

Yes, since refactored earlier, these are no longer needed

>2. Program can print nice disagnostic. The problem is that it prints UTF-8 strings to console. This works only for english text (at list on WIndows). When I debug Russian books I see less readable debug output.

No actually utf-8 should be able to represent any character in any language. The problem is Windows does not use cp65001 (utf-8) for its console but some other cp, that can not represent all possible chars. Then Windows allows filenames and paths to have full unicode names that can not be represented by their current limited 8-bit encoding. This is a serious bug as you can be sent files that you can not access in any way in python or the console.

Using utf-8 (cp65001) should allow python code to access any file or path on your system even if written in Japanese or Chinese let alone Russian. I was hoping that since the Tk widgets in the Mobi_Unpack GUI use utf-8 internally, that when you use the GUI front-end to Mobi_Unpack, it should show characters properly in the Log window no matter what (unless you have non-unicode capable fonts installed).

If you use the command line/console, the user should be able to change the cp to be 65001 (utf-8) and have things work for any file or path in command line/console mode. I might be able to wrapper this for stdout so it converts back to console encoding but the better solution is to use a suitable encoding for the console that can represent all characters (cp65001 = utf-8).

So if you get a chance, please try it both ways and see what it takes to get both the console and gui mode to work properly.

The real problem is Windows allows full unicode file and path names but then uses a console encoding (and possibly fonts) that will not properly show the full range of characters. This is silly in the extreme (imho).


> 3. escape/unescape in OPF. You recently added HTMLParser.unescape().
> Are you sure that original values are
>
> escaped? Unescaping on not escaped values would be a bug.
> Using saxutils.escape() is correct for text nodes:
> data.append('<%s>%s</%s>\n' % (tag,
> xmlescape(self.h.unescape(value)), closingTag))
> And is not suficient
>
> for attribute values:
> data.append('<meta name="%s" content="%s" />\n' % (name,
> xmlescape(self.h.unescape(value))))
>
> I later case you need also escape " as &quot; and ' as &apos;
> I sugest you use quoteattr() for atributes instead of escape()

DiapDealer is working on trying to fix the problem in the opf of some Mobi ebooks including html in the metadata when they technically should not. Since the opf is an xml document, we can not allow any html into the metadata values we will then convert into the proper xml opf entries.

I am not up-to-speed on what he wants to do here so I will ask DiapDealer to look at this again to make sure your concerns are dealt with.

> 4. mobi_unpack.py:621 Why you don't use setsectiondescription() method?
> The same with 6 other ocations in the same file.

fixed

> 5. mobi_unpack.py:704 Redundant call. the same 696, 697, 698

removed since duplicated in init, ditto for the others

> 6. mobi_unpack.py:905 method is never used

it is used when debugging the rawml, it is just not used in this version of the file. keeping it causes no harm.

> 7. mobi_unpack.py:608 duplicate map entry[/QUOTE]

fixed by removing duplicate.


Thanks!

KevinH
KevinH is offline   Reply With Quote