Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 11-10-2019, 09:12 AM   #16
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
No. They're deleted because they're not manifested in the opf. Same as Sigil has always done. We just have to figure out why the url encoding of the hrefs is happening differently in 0.9.991 compared to 0.9.18 and causing the files to be technically unmanifested.

The puzzling part is that according to the snippets you pasted above, it looks like 0.9.18 is the one that's incorrectly url-encoding the href in the manifest.

Last edited by DiapDealer; 11-10-2019 at 09:15 AM.
DiapDealer is online now   Reply With Quote
Old 11-10-2019, 09:25 AM   #17
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
Yes, the original opf is broken. An href should never be xml encoded because it should in fact be url encoded %XX *before* being written to the opf xml file. If that was done first, there would be nothing to ever xml escape!

So whomever built that href in the original source has a bad bug especially if they go and start using the entity encoding xml character "&" in filenames.

I will look into a way to look for non url encoded hrefs that are actually xml escaped and try to fix them but this may not be easy.


Good test case!

Kevin

Quote:
Originally Posted by un_pogaz View Post
And what is deleted are HTML because that are not indexed.

(And I did a test with an empty NCX and same problem)

EDIT : Arf. I don't feel like I'm being clear (even for me) and now that I have the 3 <item> the problem is obvious to me.

In the original ePub, the file name reference is in XML format. At Loading it is translated directly in "web url", transforming the &amp; into %26amp%3B instead of %26, which breaks the reference of the object.

0.9.18 seems to detect the format correctly and therefore avoids parse error.
KevinH is online now   Reply With Quote
Advert
Old 11-10-2019, 09:27 AM   #18
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
I downloaded the update in Google Chrome. It produced a warning that the file is not commonly downloaded and may be dangerous - this seems to be a false warning as my virus checker revealed no issues and the checksum works out OK.
CalibUser is offline   Reply With Quote
Old 11-10-2019, 09:39 AM   #19
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by CalibUser View Post
I downloaded the update in Google Chrome. It produced a warning that the file is not commonly downloaded and may be dangerous - this seems to be a false warning as my virus checker revealed no issues and the checksum works out OK.
Yes. That commonly happens. It's not technically a "false warning", though. The file is NOT commonly downloaded. Which is Microsoft's sole reason for flagging it as potentially dangerous. That's why we provide checksums and signatures to allow you to assure yourself that we uploaded the file, and that it wasn't tampered with afterward.

This is a pre-release, so it may never see the hits necessary for the "not commonly downloaded" warning to go away. It's a BS warning to scare people away from using software downloaded from the internet. By now, I figure people either trust us or they don't (after verifying through checksums and signatures that the upload hasn't been tampered with, of course).

Last edited by DiapDealer; 11-10-2019 at 09:44 AM.
DiapDealer is online now   Reply With Quote
Old 11-10-2019, 09:50 AM   #20
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by KevinH View Post
I will look into a way to look for non url encoded hrefs that are actually xml escaped and try to fix them but this may not be easy.
It looks to me (just by glancing with the naked eye) that 0.9.18 is correctly adjusting the incorrect xml ampersand entity in the href, while 0.9.991 is doubling down on it. Perhaps the hrefs should be passed through an entity to character routine before being re- url-encoded? That's just a quick WAG on my part, though.
DiapDealer is online now   Reply With Quote
Advert
Old 11-10-2019, 09:55 AM   #21
un_pogaz
Chalut o/
un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.
 
un_pogaz's Avatar
 
Posts: 410
Karma: 145324
Join Date: Dec 2017
Device: Kobo
Well the problem is now known, I hope it can be solved, because otherwise, I can't even imagine the numbers of ePub that could be broken with this behavior change.

A defect to check all XML entities, at least the Ampersand.
Many software programs standardize href in ASCII (this is what is done here : "Operation Astree" should be written "Opération Astrée"), so only Ampersand remains as a common entity (then that less-than and grand-than are prohibited)

Last edited by un_pogaz; 11-10-2019 at 09:59 AM.
un_pogaz is offline   Reply With Quote
Old 11-10-2019, 10:17 AM   #22
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by un_pogaz View Post
Well the problem is now known, I hope it can be solved, because otherwise, I can't even imagine the numbers of ePub that could be broken with this behavior change.
I can imagine the numbers. I imagine the number of epubs being created with illegally xml-encoded hrefs and/or filenames will be quite, quite low. But we'll still try to fix it if we can.
DiapDealer is online now   Reply With Quote
Old 11-10-2019, 11:34 AM   #23
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
Those epubs would/should never pass epubcheck at all as even the spaces in the original opf manifest are not properly url encoded either! That manifest item is just all wrong!

Please report this error to whomever generates crap like that so that they fix their "software".

As for Sigil, I have modified Utility.cpp URLDecode and URLEncode routines to try to *un* xmlescape any hrefs before running any URL Encode or Decode on them.

That seems to do the trick in this case but that epub violates the spec clearly and probably will not work on many ebook devices! For their own sake, they should fix that crap.

People keep forgetting that these things represent actual files on actual filesystems and all of the problems and limitations that entails ... illegal characters, spaces, file paths too long, case sensitive vs case insensitive vs case insensitive but case rememebered, etc.

And they should remember that not every valid file system path is representable on another device.

Good epub devs will stick to simple, short filenames restricted to the set of characters that exist on all devices and that do not have unicode normalization issues depending on a specifc sequence to generate/build-up the character as filesystems do not all handle that the same way.

Hopefully with the lastest commits to master, the next release of Sigil will be more robust to broken opf hrefs.

Last edited by KevinH; 11-10-2019 at 11:45 AM.
KevinH is online now   Reply With Quote
Old 11-10-2019, 12:24 PM   #24
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
FWIW, the difference in this between Sigil-0.9.18 and Sigil-0.9.991 is that under 0.9.18 the QXmlStream reader automatically decoded the xml escaping (and still does under 0.9.991) but 0.9.18 completely recreates the opf with what we actually found in the zip, and that does the proper url encoding when creating the new opf. Under 0.9.991 we do not create the opf, we instead use it exactly as it is found inside the epub which results in a broken opf since the original opf was in fact broken. All we do then is try properly url encode the hrefs as in the spec, and end of url encoding the bad xml escaped string.

If we truly need to, we can go back to assuming all opfs are broken, get what we can from it and create a valid opf from scratch. That just seems to defeat the whole purpose of loading an epub "as-is".

In any event, it is probably a good idea for users to use Doitsu's wonderful epubcheck plugin immediately after loading a strange epub just to let you know what is broken, so it can be fixed before using Sigil's more advanced features.
KevinH is online now   Reply With Quote
Old 11-10-2019, 12:25 PM   #25
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Surprisingly enough, EpubCheck doesn't seem to care one bit about an xml entity being in an opf manifest href. It cares more about the space in the href (and that's just a warning, not an error). Even if checked before Sigil attempts to urlencode the href.

Clearly, the file name itself can't have an actual xml entity in it. Neither Sigil nor Epubcheck seem to allow that. Surprises the heck out of me.

I can confirm that your fix seems to handle this situation well enough, though.

Last edited by DiapDealer; 11-10-2019 at 12:31 PM.
DiapDealer is online now   Reply With Quote
Old 11-10-2019, 12:26 PM   #26
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by KevinH View Post
FThat just seems to defeat the whole purpose of loading an epub "as-is".
Agreed.
DiapDealer is online now   Reply With Quote
Old 11-10-2019, 12:50 PM   #27
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by DiapDealer View Post
Surprisingly enough, EpubCheck doesn't seem to care one bit about an xml entity being in an opf manifest href.
The attached epub, for instance, passes EpubCheck with flying colors. Even though it has an opf manifest href of "file&amp;name.xhtml" pointing to a file named file&name.xhtml.
Attached Files
File Type: epub hardcase.epub (1.8 KB, 109 views)
DiapDealer is online now   Reply With Quote
Old 11-10-2019, 01:47 PM   #28
un_pogaz
Chalut o/
un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.un_pogaz understands the importance of being earnest.
 
un_pogaz's Avatar
 
Posts: 410
Karma: 145324
Join Date: Dec 2017
Device: Kobo
I have no idea who created this ePub, when and with which software.
The last known software to have touched this ePub is Caliber0.8.44 in 2012.
Is he the culprit? Unlikely but to be seen.

I agree that this ePub does not respect the standard, but I think that "old" ePubs like this one, were created by software, which at the time was only at the premise of what it is today.
And I think that this mistake is in many more books than we think, simply, it has remained unnoticed all this time.
Because this error is incredibly subtle: from an XML point of view it is not an error!
And until now it was not even known and treated as such. It was just fixed because we were in the right sequence of method and treatment.

Last edited by un_pogaz; 11-10-2019 at 02:03 PM.
un_pogaz is offline   Reply With Quote
Old 11-10-2019, 02:10 PM   #29
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by un_pogaz View Post
from an XML point of view it is not an error!
Meaningless when Epub is not confined solely to xml rules.

Neither of us can really say how many epubs this sort of thing might occur in, it's still my contention that it's not nearly as prevalent as you seem to want it to be.

As for calibre being the last software to touch the epub, well that's not really indicative of much of anything either. Calibre's editor prides itself on making as few changes to an epub's code as possible, and will only attempt to fix things when specifically asked to. I can confirm that calibre will correct this situation only when given permission to do so after running the "Check Book" tool.

Regardless ... Kevin has already pushed a commit that seems to fix the issue for Sigil as well. Just keep in mind that we won't always be able to save everyone from themselves automatically (with non-invasive changes). Garbage in is still garbage in.
DiapDealer is online now   Reply With Quote
Old 11-11-2019, 02:22 AM   #30
odamizu
just an egg
odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.
 
odamizu's Avatar
 
Posts: 1,587
Karma: 4300000
Join Date: Mar 2015
Device: Kindle, iOS
I've spent a little time playing with Sigil 0.9.991 on macOS High Sierra, and so far so good. I also opened 0.9.991 on Catalina, but haven't done anything further than to make sure it launches successfully.

Sigil itself is running fine so far, however I've run into one plugin (InsertImageSVG) that fails when it comes to epubs with non-Sigil-standard structures. Using Tools > Restructure Epub to Sigil Norm solves the problem.

QUESTION: I understand KevinH and DiapDealer went to great lengths to enable Sigil to accept and handle any epub structure. Is there an advantage to leaving the epub in its original non-Sigil-standard structure? Is there any reason that I should not use the Tools > Restructure Epub to Sigil Norm as soon as I load an epub into Sigil as part of my workflow now?

Last edited by odamizu; 11-11-2019 at 03:23 AM. Reason: updated info
odamizu is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
5.8.11 Pre=release knc1 Kindle Developer's Corner 21 04-17-2018 08:42 PM
Overdrive - Search for Pre-Release nynaevelan General Discussions 11 11-24-2013 02:27 PM
Where are the pre-release purchase buttons? Sydney's Mom General Discussions 8 09-06-2012 10:57 PM
KF Android 4.0 Pre-Release Version robertc88 Kindle Fire 22 01-22-2012 07:24 PM
PDF Viewer 0.3.0 pre-release pruss Android Devices 62 11-22-2011 11:18 AM


All times are GMT -4. The time now is 11:46 AM.


MobileRead.com is a privately owned, operated and funded community.