![]() |
#16 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,775
Karma: 6000000
Join Date: Nov 2009
Device: many
|
For speed reasons, Sigil will simply look for no errors when loading a document. Sigil's importer is not a full validator nor does it have the interface to be one. If any errors are detected, the user is warned that errors exist.
You do not need to worry about loading an epub that is well formed in Sigil. There is no need to pass epubcheck before being loaded in Sigil. Sigil will warn only when it finds something it does not consider spec and will offer to automatically fix it for you. I think you see that message about the possibility of data loss, and it gives you pause. Any fears here are very overblown. Actual data loss is very very rare and only happens due to the xhtml text not being at all parseable (ie so broken so that gumbo's parser can not read it). Simple things like missing doctype, missing html tag, etc, will always be happily and safely be fixed by Sigil's gumbo parser during Mend. In earlier versions of Sigil before Sigil 1.0, Sigil just quietly fixed these errors for you while moving all files to their Sigil standard locations. From Sigil-1.0 to Sigil-1.3, once moving to "standard locations" stopped, errors due to missing doctypes started to happen but not be fixed or detected until another Sigil tool was run that needed gumbo. From Sigil 1.4 onwards, we now properly detect the missing doctype and warn the user on load so that they run Mend (after load) or we do it for the them depending on user Preferences. If you truly do not want to auto run Mend on open, set it to that in your Preferences. If you want to see what is broken that mend fixes, then checkpoint it after load and then run mend, and then compare it against the checkpoint. Quote:
|
|
![]() |
![]() |
#17 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,730
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
I just re-tested this and Calibre Editor correctly escaped & as %26. (I don't know how I ended up with &.)
|
![]() |
Advert | |
|
![]() |
#18 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,775
Karma: 6000000
Join Date: Nov 2009
Device: many
|
That is strange. So Calibre wants to % encode the & in the filenames in manifest href urls too. That is the spec.
So perhaps a conversion or plugin in calibre from another format generates these broken manifest entries? Either way, this is not a Sigil bug. If desired the user can unzip the broken epub and use Add Existing to add in the xhtml files, images and css to properly create an epub with a working opf. But the right way to maximize compatibility with epub readers it to not use reserved chars inside filenames in the first place. Last edited by KevinH; 03-20-2021 at 06:57 PM. |
![]() |
![]() |
#19 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,577
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
As far as I'm concerned, Sigil is handling filenames with ampersands correctly. You can rename files within the epub to contain ampersands and Sigil will properly encode the path in the opf manifest. Sigil will also open any epub that has a filename with an ampersand so long as the ampersand is encoded correctly in the opf manifest (and the path exists) to begin with.
The only time there's a problem is if an existing epub has an opf with an invalid ampersand character in it. And even then, it's not because of the illegal ampersand character (directly) that files might be discarded when subsequently saving the epub. It's the fact that the illegal character makes the path to the file incorrect. Which means it's not properly manifested. And improperly manifested files have always been discarded when opening them. It's no different than a mispelling or a case sensitivity issue. Last edited by DiapDealer; 03-20-2021 at 07:46 PM. |
![]() |
![]() |
#20 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,775
Karma: 6000000
Join Date: Nov 2009
Device: many
|
I am in 100% agreement with you. The only question in my mind is if we could make the ImportEpub.cpp code be more robust to broken manifest entries somehow, without adding potential problems.
Something like take each manifest entry and try the following: 1. try url % decoding manifest entry to see if it matches a file in the epub zip archive 2. if not, try xml decoding the manifest entry to see if it matches a file in the epub archive but report a url encoding warning in the manifest. 3. finally, if not, try taking it at face value (no xml decoding or % decoding) and see if the manifest href matches a file in the epub zip archive. 4. if not report the manifest entry url not being present in the epub as a warning... After doing this for every manifest entry, report any leftover files in the epub zip as unmanifested. That may help Sigil read in epubs with broken manifest hrefs. Quote:
|
|
![]() |
Advert | |
|
![]() |
#21 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,577
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I'd be up for anything that made ImportEpub less allergic to broken manifest entries. But are you talking about potentially fixing broken manifest entries, or just giving more meaningful warnings regarding them? The former sounds great, but I would think it could lead to other problems. The latter would still be an improvement and sounds less ... dangerous?
|
![]() |
![]() |
#22 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,775
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Hopefully fixing them upon import would be the goal. But you are right, this might cause issues and so would be something for Sigil-1.6.0 not Sigil-1.5.x.
|
![]() |
![]() |
#23 | |
Hedge Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 802
Karma: 19999999
Join Date: May 2011
Location: UK/Philippines
Device: Kobo Touch, Nook Simple
|
Quote:
Rather than trying to fix the manifest would it be easier/cause less problems to generate a new one? |
|
![]() |
![]() |
#24 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,775
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Essentially we do that on import but use the existing opf to guide that. The opf is supposed to have a valid manifest and files not listed in the manifest should be ignored. The opf should also have a spine indicating the order of the xhtml files. That spine order uses the manifest ids assigned to each valid file. So having a valid opf is important. No epub should ever be without one.
There are also file naming/path issues that can cause security concerns for maliciously crafted epub/zip archives. So the best way to handle this is for the original epub to have a proper valid opf, manifest and spine included. If not, you can still open this epub in Sigil by unzipping it and using Add Existing to add the xhtml files and Sigil will properly create a manifest. You would have to determine the correct spine order from a toc or from other sources. So just ignoring the manifest and/or the opf is typically not a good idea. It can be worked around if broken and that is what we are discussing. |
![]() |
![]() |
#25 | |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 441
Karma: 77256
Join Date: Sep 2011
Device: none
|
Quote:
If it were possible to distinguish between such malformed errors and others such as merely fixing doctype or other metadata, that would save a lot of time rather than having to as you suggest checkpoint and compare. Often I am editing outside of Sigil as another editor I find quicker to load up an EPUB, and perform regular edits. I understand Sigil does not fully validate the EPUB; it would just be nice to have more clarity in the message. |
|
![]() |
![]() |
#26 | |
Hedge Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 802
Karma: 19999999
Join Date: May 2011
Location: UK/Philippines
Device: Kobo Touch, Nook Simple
|
Quote:
![]() |
|
![]() |
![]() |
#27 |
Chalut o/
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 439
Karma: 145424
Join Date: Dec 2017
Device: Kobo
|
A lot of things, I'll answer as best I can to your request.
Crap, I just realized my interpretation/reading mistake for the 2) : False alert, sorry ![]() The ePub "& amp" is the one with the 1) element I want to specify that this ePub was perfectly valid in the 1.4, Sigil authorized me perfectly this character. The 1.5 failed to load this. The ePub "JSON" contains the element 6) [copyright safe] Video of the result in the editor https://youtu.be/YisMuSQRtHA For 3) Yes, this is theoretically a UUID, so it's not the best example, but if I wanted to add a custom one (like "internal-editor-version"), I think having to go directly through OPF is brutal. Maybe add at the end of the list a "custom identifier" entry that when selected allows direct editing of the scheme (possibly through a small dialog window) (and I speak only about epub2) The 4) is a bit the same problem [image], you can't edit the "header" of the Custom element. Maybe it's my little comfort, but it would be convenient to not have to think « this I can edit here, this not I have to close the window open a file and find the only thing I'm looking for among 40 lines of raw text ». (also only about epub2) And sorry I wasn't clear in my first post. Last edited by un_pogaz; 03-21-2021 at 06:35 AM. |
![]() |
![]() |
#28 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,577
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
In my tests, the same 10 files with illegal & entities in the href attributes of the opf manifest give the exact same warning using both Sigil 1.4 and 1.5. Both versions (1.4.and 1.5) also discard the same 10 improperly manifested files in my testing.
|
![]() |
![]() |
#29 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,775
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Sigil uses Google's gumbo parser which follows the rules of html5 parsing standard according to the whatwg browser standard. In other words, it does exactly the same thing that Chrome, Edge, Safari, and Firefox, does when faced with that same broken code.
As for more details, once loaded Preview will show you exactly where and what well-formed errors exist on a file by file basis if desired. Just turn off Mend on open. Load the epub "as-is". Hit checkpoint (one mouse click). Run Mend. Compare against the fixed code to see everything that gumbo fixed or changed. Or instead of Mend, just run a basic well-formed check (see Sigil's menus), which is a basic validator that will show you just serious well-formed errors. You have options. Loading an epub is not going to give you a complete list of errors like a validator will for speed and missing interface reasons but can load them "as-is" for you to validate, or hand fix, or can auto fix them for you. Quote:
|
|
![]() |
![]() |
#30 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,577
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sigil-0.9.15 Beta Released | DiapDealer | Sigil | 69 | 07-11-2019 03:30 PM |
[Android] CC V5.2.6 beta released | chaley | Calibre Companion | 12 | 05-25-2017 12:00 PM |
Sigil-0.8.900 released for testing - Wait for Sigil-0.8.901 | KevinH | Sigil | 106 | 10-04-2015 10:41 AM |
V4.1.4 has been released to the beta group | chaley | Calibre Companion | 2 | 08-11-2015 12:07 PM |
Feedbooks.com: Beta released | Hadrien | Deals and Resources (No Self-Promotion or Affiliate Links) | 161 | 03-26-2008 12:23 PM |