View Single Post
Old 11-10-2019, 09:25 AM   #17
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,713
Karma: 5444398
Join Date: Nov 2009
Device: many
Yes, the original opf is broken. An href should never be xml encoded because it should in fact be url encoded %XX *before* being written to the opf xml file. If that was done first, there would be nothing to ever xml escape!

So whomever built that href in the original source has a bad bug especially if they go and start using the entity encoding xml character "&" in filenames.

I will look into a way to look for non url encoded hrefs that are actually xml escaped and try to fix them but this may not be easy.


Good test case!

Kevin

Quote:
Originally Posted by un_pogaz View Post
And what is deleted are HTML because that are not indexed.

(And I did a test with an empty NCX and same problem)

EDIT : Arf. I don't feel like I'm being clear (even for me) and now that I have the 3 <item> the problem is obvious to me.

In the original ePub, the file name reference is in XML format. At Loading it is translated directly in "web url", transforming the &amp; into %26amp%3B instead of %26, which breaks the reference of the object.

0.9.18 seems to detect the format correctly and therefore avoids parse error.
KevinH is offline   Reply With Quote