Hi tkeo,
One other thing, I am not a big fan of minidom at all. It seems generally bloated and barfs if any true unicode is used (at least on 2.X). I see you wrote both a xml.dom.minidom version and a regular expression version of things. Every time I have used a xml elementTree or some other XML parser (either standard package or add-ons) in python 2.X I have run into problem cases that simply do not parse well or get confused with encodings, resulting in non-robust operation on some platforms (Mac, Win, or Linux).
So unless you feel strongly about it (and given the re vs dom code sizes are about the same), I would rather stick with regular expressions version as they are easier for people to modify and fix are are robust to most encoding issues.
I see you have also written a metadata parsing routine that supports epub 3 like "refines" on named items. This is quite nice but using it in epub 2 spec devices might cause problems.
I really think we should incorporate your code and try and create an epub 3 generator version of KindleUnpack to stay in epub 3 space and not try to mix private extensions into what is primarily epub 2 code.
What do you think?
KevinH
|