Hi KevinH,
Quote:
Originally Posted by KevinH
One other thing, I am not a big fan of minidom at all. It seems generally bloated and barfs if any true unicode is used (at least on 2.X). I see you wrote both a xml.dom.minidom version and a regular expression version of things. Every time I have used a xml elementTree or some other XML parser (either standard package or add-ons) in python 2.X I have run into problem cases that simply do not parse well or get confused with encodings, resulting in non-robust operation on some platforms (Mac, Win, or Linux).
So unless you feel strongly about it (and given the re vs dom code sizes are about the same), I would rather stick with regular expressions version as they are easier for people to modify and fix are are robust to most encoding issues.
I see you have also written a metadata parsing routine that supports epub 3 like "refines" on named items. This is quite nice but using it in epub 2 spec devices might cause problems.
|
Firstly, I have no reason to stick to using dom. So I will revert the code to use re. Parsing RESC section using the dom makes look code simpler and shorter than using re. (The re version needs a Metadata class I wrote, whereas the dom version does not.) I think the dom is suitable to represent an epub structure; however, currently, less familiar and less stable than the re.
On my environment (python 2.7.6 windows 32bit), the minidom is able to parse utf-8 and stored as an Unicode string; re-encoding utf-8 is necessary to use. It is quite confusing for me. (Utf-8 is one of the encodings of unicode, isn't it?) If the minidom stored elements as utf-8 strings, it would be very easy, I think.
Quote:
I really think we should incorporate your code and try and create an epub 3 generator version of KindleUnpack to stay in epub 3 space and not try to mix private extensions into what is primarily epub 2 code.
What do you think?
|
I have considered. It might be better to create the KindleUnpack of pure epub 3 version, separately to epub 2 version. Since the pure epub 3 ebooks will become popular but many epub 2 books will remain because of no necessity of the epub 3 features.
But, now, there are many epub 2 ebooks available but epub3 books are not so popular, and as you mentioned, books basically based on epub2 plus partly included epub3 definition are published from vendors. So, I think, it is not time for creating the pure epub3 version of the KindleUnpack.
Quote:
Thank you for the examples, I will play around with them. I can't believe that a Kindle device supports the spine/page spread properties by keeping and parsing the RESC section on the fly during reading. My guess is they must include or encode that information in some other way but I that is just a guess and I could be wrong.
|
As for the RESC section. I've guessed, the RESC section is prepared to store the information that K8 format does not define. It's just guess no evidence.
Thanks,
tkeo