Yes, the original opf is broken. An href should never be xml encoded because it should in fact be url encoded %XX *before* being written to the opf xml file. If that was done first, there would be nothing to ever xml escape!
So whomever built that href in the original source has a bad bug especially if they go and start using the entity encoding xml character "&" in filenames.
I will look into a way to look for non url encoded hrefs that are actually xml escaped and try to fix them but this may not be easy.
Good test case!
Kevin
Quote:
Originally Posted by un_pogaz
And what is deleted are HTML because that are not indexed.
(And I did a test with an empty NCX and same problem)
EDIT : Arf. I don't feel like I'm being clear (even for me) and now that I have the 3 <item> the problem is obvious to me.
In the original ePub, the file name reference is in XML format. At Loading it is translated directly in "web url", transforming the & into %26amp%3B instead of %26, which breaks the reference of the object.
0.9.18 seems to detect the format correctly and therefore avoids parse error.
|