11-10-2019, 09:12 AM | #16 |
Grand Sorcerer
Posts: 27,583
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
No. They're deleted because they're not manifested in the opf. Same as Sigil has always done. We just have to figure out why the url encoding of the hrefs is happening differently in 0.9.991 compared to 0.9.18 and causing the files to be technically unmanifested.
The puzzling part is that according to the snippets you pasted above, it looks like 0.9.18 is the one that's incorrectly url-encoding the href in the manifest. Last edited by DiapDealer; 11-10-2019 at 09:15 AM. |
11-10-2019, 09:25 AM | #17 | |
Sigil Developer
Posts: 7,696
Karma: 5444398
Join Date: Nov 2009
Device: many
|
Yes, the original opf is broken. An href should never be xml encoded because it should in fact be url encoded %XX *before* being written to the opf xml file. If that was done first, there would be nothing to ever xml escape!
So whomever built that href in the original source has a bad bug especially if they go and start using the entity encoding xml character "&" in filenames. I will look into a way to look for non url encoded hrefs that are actually xml escaped and try to fix them but this may not be easy. Good test case! Kevin Quote:
|
|
Advert | |
|
11-10-2019, 09:27 AM | #18 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
I downloaded the update in Google Chrome. It produced a warning that the file is not commonly downloaded and may be dangerous - this seems to be a false warning as my virus checker revealed no issues and the checksum works out OK.
|
11-10-2019, 09:39 AM | #19 | |
Grand Sorcerer
Posts: 27,583
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
This is a pre-release, so it may never see the hits necessary for the "not commonly downloaded" warning to go away. It's a BS warning to scare people away from using software downloaded from the internet. By now, I figure people either trust us or they don't (after verifying through checksums and signatures that the upload hasn't been tampered with, of course). Last edited by DiapDealer; 11-10-2019 at 09:44 AM. |
|
11-10-2019, 09:50 AM | #20 |
Grand Sorcerer
Posts: 27,583
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
It looks to me (just by glancing with the naked eye) that 0.9.18 is correctly adjusting the incorrect xml ampersand entity in the href, while 0.9.991 is doubling down on it. Perhaps the hrefs should be passed through an entity to character routine before being re- url-encoded? That's just a quick WAG on my part, though.
|
Advert | |
|
11-10-2019, 09:55 AM | #21 |
Chalut o/
Posts: 411
Karma: 145324
Join Date: Dec 2017
Device: Kobo
|
Well the problem is now known, I hope it can be solved, because otherwise, I can't even imagine the numbers of ePub that could be broken with this behavior change.
A defect to check all XML entities, at least the Ampersand. Many software programs standardize href in ASCII (this is what is done here : "Operation Astree" should be written "Opération Astrée"), so only Ampersand remains as a common entity (then that less-than and grand-than are prohibited) Last edited by un_pogaz; 11-10-2019 at 09:59 AM. |
11-10-2019, 10:17 AM | #22 |
Grand Sorcerer
Posts: 27,583
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I can imagine the numbers. I imagine the number of epubs being created with illegally xml-encoded hrefs and/or filenames will be quite, quite low. But we'll still try to fix it if we can.
|
11-10-2019, 11:34 AM | #23 |
Sigil Developer
Posts: 7,696
Karma: 5444398
Join Date: Nov 2009
Device: many
|
Those epubs would/should never pass epubcheck at all as even the spaces in the original opf manifest are not properly url encoded either! That manifest item is just all wrong!
Please report this error to whomever generates crap like that so that they fix their "software". As for Sigil, I have modified Utility.cpp URLDecode and URLEncode routines to try to *un* xmlescape any hrefs before running any URL Encode or Decode on them. That seems to do the trick in this case but that epub violates the spec clearly and probably will not work on many ebook devices! For their own sake, they should fix that crap. People keep forgetting that these things represent actual files on actual filesystems and all of the problems and limitations that entails ... illegal characters, spaces, file paths too long, case sensitive vs case insensitive vs case insensitive but case rememebered, etc. And they should remember that not every valid file system path is representable on another device. Good epub devs will stick to simple, short filenames restricted to the set of characters that exist on all devices and that do not have unicode normalization issues depending on a specifc sequence to generate/build-up the character as filesystems do not all handle that the same way. Hopefully with the lastest commits to master, the next release of Sigil will be more robust to broken opf hrefs. Last edited by KevinH; 11-10-2019 at 11:45 AM. |
11-10-2019, 12:24 PM | #24 |
Sigil Developer
Posts: 7,696
Karma: 5444398
Join Date: Nov 2009
Device: many
|
FWIW, the difference in this between Sigil-0.9.18 and Sigil-0.9.991 is that under 0.9.18 the QXmlStream reader automatically decoded the xml escaping (and still does under 0.9.991) but 0.9.18 completely recreates the opf with what we actually found in the zip, and that does the proper url encoding when creating the new opf. Under 0.9.991 we do not create the opf, we instead use it exactly as it is found inside the epub which results in a broken opf since the original opf was in fact broken. All we do then is try properly url encode the hrefs as in the spec, and end of url encoding the bad xml escaped string.
If we truly need to, we can go back to assuming all opfs are broken, get what we can from it and create a valid opf from scratch. That just seems to defeat the whole purpose of loading an epub "as-is". In any event, it is probably a good idea for users to use Doitsu's wonderful epubcheck plugin immediately after loading a strange epub just to let you know what is broken, so it can be fixed before using Sigil's more advanced features. |
11-10-2019, 12:25 PM | #25 |
Grand Sorcerer
Posts: 27,583
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Surprisingly enough, EpubCheck doesn't seem to care one bit about an xml entity being in an opf manifest href. It cares more about the space in the href (and that's just a warning, not an error). Even if checked before Sigil attempts to urlencode the href.
Clearly, the file name itself can't have an actual xml entity in it. Neither Sigil nor Epubcheck seem to allow that. Surprises the heck out of me. I can confirm that your fix seems to handle this situation well enough, though. Last edited by DiapDealer; 11-10-2019 at 12:31 PM. |
11-10-2019, 12:26 PM | #26 |
Grand Sorcerer
Posts: 27,583
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
11-10-2019, 12:50 PM | #27 |
Grand Sorcerer
Posts: 27,583
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
The attached epub, for instance, passes EpubCheck with flying colors. Even though it has an opf manifest href of "file&name.xhtml" pointing to a file named file&name.xhtml.
|
11-10-2019, 01:47 PM | #28 |
Chalut o/
Posts: 411
Karma: 145324
Join Date: Dec 2017
Device: Kobo
|
I have no idea who created this ePub, when and with which software.
The last known software to have touched this ePub is Caliber0.8.44 in 2012. Is he the culprit? Unlikely but to be seen. I agree that this ePub does not respect the standard, but I think that "old" ePubs like this one, were created by software, which at the time was only at the premise of what it is today. And I think that this mistake is in many more books than we think, simply, it has remained unnoticed all this time. Because this error is incredibly subtle: from an XML point of view it is not an error! And until now it was not even known and treated as such. It was just fixed because we were in the right sequence of method and treatment. Last edited by un_pogaz; 11-10-2019 at 02:03 PM. |
11-10-2019, 02:10 PM | #29 |
Grand Sorcerer
Posts: 27,583
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Meaningless when Epub is not confined solely to xml rules.
Neither of us can really say how many epubs this sort of thing might occur in, it's still my contention that it's not nearly as prevalent as you seem to want it to be. As for calibre being the last software to touch the epub, well that's not really indicative of much of anything either. Calibre's editor prides itself on making as few changes to an epub's code as possible, and will only attempt to fix things when specifically asked to. I can confirm that calibre will correct this situation only when given permission to do so after running the "Check Book" tool. Regardless ... Kevin has already pushed a commit that seems to fix the issue for Sigil as well. Just keep in mind that we won't always be able to save everyone from themselves automatically (with non-invasive changes). Garbage in is still garbage in. |
11-11-2019, 02:22 AM | #30 |
just an egg
Posts: 1,589
Karma: 4300000
Join Date: Mar 2015
Device: Kindle, iOS
|
I've spent a little time playing with Sigil 0.9.991 on macOS High Sierra, and so far so good. I also opened 0.9.991 on Catalina, but haven't done anything further than to make sure it launches successfully.
Sigil itself is running fine so far, however I've run into one plugin (InsertImageSVG) that fails when it comes to epubs with non-Sigil-standard structures. Using Tools > Restructure Epub to Sigil Norm solves the problem. QUESTION: I understand KevinH and DiapDealer went to great lengths to enable Sigil to accept and handle any epub structure. Is there an advantage to leaving the epub in its original non-Sigil-standard structure? Is there any reason that I should not use the Tools > Restructure Epub to Sigil Norm as soon as I load an epub into Sigil as part of my workflow now? Last edited by odamizu; 11-11-2019 at 03:23 AM. Reason: updated info |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
5.8.11 Pre=release | knc1 | Kindle Developer's Corner | 21 | 04-17-2018 08:42 PM |
Overdrive - Search for Pre-Release | nynaevelan | General Discussions | 11 | 11-24-2013 02:27 PM |
Where are the pre-release purchase buttons? | Sydney's Mom | General Discussions | 8 | 09-06-2012 10:57 PM |
KF Android 4.0 Pre-Release Version | robertc88 | Kindle Fire | 22 | 01-22-2012 07:24 PM |
PDF Viewer 0.3.0 pre-release | pruss | Android Devices | 62 | 11-22-2011 11:18 AM |