|  03-18-2019, 04:21 AM | #1 | 
| Chalut o/            Posts: 486 Karma: 678910 Join Date: Dec 2017 Device: Kobo |  Remove the non-indexed elements?! 
			
			When opening an ePub, Sigil only loads the indexed files into the <manifest>, and will delete any non-indexed files. I understand this behavior which aims to remove parasitic files added by other unscrupulous and indelicate software. However, more than once I have opened an ePub and I have only HTML pages and no images or CSS sheets. However, its files are present in the ePub, simply they are not indexed in the <manifest> (probably because of unscrupulous software that confuses <manifest> and <spine>) What I reproach Sigil for is the unilateral decision to delete non-indexed files, without warning or consultation of the user, because the risk of completely breaking an ePub is important. It would be nice if, when opening, Sigil checked the correspondence <manifest>/ "content of the ZIP archive" and asked the user if he wanted to add them to the <manifest> in case of an non-indexed file as found. The must would be that we are a check-list of files that we want to keep or not. | 
|   |   | 
|  03-18-2019, 06:18 AM | #2 | 
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			Sigil's every action is predicated on the notion that the opf is entirely correct. And every action Sigil undertakes ensures that the opf stays that way. Sigil can't warn you about unmanifested xhtml files when it opens an epub, because it doesn't know they exist. It's not purposely deleting them. It hasn't made a "unilateral decision." It just hasn't loaded them. Because the opf (the boss) didn't tell it to do so when it was being parsed. What you're asking for would take a complete overhaul of how Sigil opens epubs, or a complete overhaul in how they're saved. Neither would be a trivial undertaking.
		 Last edited by DiapDealer; 03-18-2019 at 06:21 AM. | 
|   |   | 
|  03-18-2019, 09:17 AM | #3 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Sounds like you should create an input plugin that walks the contents of the epubs zip file and adds any files present but not manifested that are  css, image, font, or xhtml files to the manifest, and the xhtml files to the end of the spine in a some pattern (but what order?).  You might want to include unmanifested javascript files as well.   It appears that these bad "epubs" are nothing more than a zip of a website that has been scraped. Alternatively, unzip these "books" first and then use Sigil's Add Existing ... menu to pull in the pieces you want. | 
|   |   | 
|  03-18-2019, 10:00 AM | #4 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Also since a spine is made up from manifest ids, any file not manifested can not be in the anyway.  So no spine order would be relevant either. That is not even close to being an epub. That is just a zip archive. | 
|   |   | 
|  03-18-2019, 07:33 PM | #5 | 
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			Someone is clearly changing the extension of an .htmlz archive to .epub in my opinion. The metadata.opf file they often contain is woefully insufficient for the purposes of epub editing.
		 | 
|   |   | 
|  03-19-2019, 12:17 AM | #6 | |
| Bibliophagist            Posts: 47,992 Karma: 174315100 Join Date: Jul 2010 Location: Vancouver Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos | Quote: 
 A couple of quotes: From the epub3 standard: All Publication Resources must be referenced from the manifest, regardless of whether they are included in the EPUB Container or made available remotely. and from Mukli Krisztián's epub boilerplate: The content.opf file is the most important part of the EPUB package, because it defines the structure of the eBook and the metadata. Manifest Section – This section is a list of all the content files, media, fonts, and stylesheets used in the eBook. The files can be listed in any order. However, you should not include a file in the Manifest Section that is not in the EPUB package. Also, you should not have undeclared files in the EPUB package that have not been declared in the Manifest Section. [/QUOTE] | |
|   |   | 
|  03-19-2019, 04:28 AM | #7 | 
| Chalut o/            Posts: 486 Karma: 678910 Join Date: Dec 2017 Device: Kobo | 
			
			I agree that, theoretically, an unindexed file in the <manifest> "does not exist" in the ePub, and that therefore technically the ePub is broken. But not really either because the files are well used (in the <link> and <img>), but their non-importation breaks the ePub even more. Problems that could have been fixed become unfixable. Is a conversion problem, possible. But the idea is that not everyone is also respectful of the standard, so a little caution will be good. The most vicious thing is that sometimes some files are indexed, others are not. I am not asking for a systematic addition but just a check, which can be done using the "file table" of the ZIP archive. And no need to decompress any files, just load the OPF in memory and read the entries of the <manifest> (if Sigil was in C#, I could tried it). Then Sigil works normally, and creates a conforme ePubs. If it proves too complex to implement, okay, very sad, but I just want to reported this problem and ask for some thought to solve it. To answer KevinH: So you ask me to 1) open with a "ZIP opener" each ePub I want to work 2) looked in the OPF the correspondence <manifest> contained in the ZIP 3) added the entries "forgotten" ? By hand?  This is possible, but the goal of a software is it to automate tedious, repetitive and sometimes complex tasks? Might as well take advantage of it. Last edited by un_pogaz; 03-19-2019 at 04:32 AM. | 
|   |   | 
|  03-19-2019, 05:22 AM | #8 | 
| Hedge Wizard            Posts: 802 Karma: 19999999 Join Date: May 2011 Location: UK/Philippines Device: Kobo Touch, Nook Simple | 
			
			Have a look at the "Modify ePub" plugin for Calibre. This plugin can perform certain jobs on ePubs  including "Add unmanifested file to Manifest". If you run the "Modify ePub" plugin on your ePub before opening it in Sigil it may very prevent the problem you are having. | 
|   |   | 
|  03-19-2019, 07:00 AM | #9 | 
| Chalut o/            Posts: 486 Karma: 678910 Join Date: Dec 2017 Device: Kobo | 
			
			Thank, I'd look at that. (But integrating this security feature would always be a plus) | 
|   |   | 
|  03-19-2019, 08:53 AM | #10 | 
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			What you're trying to open is not technically even an epub. It's just been given an .epub extension. The opf file in that kind of archive is for metadata purposes only. Sigil's not going to accommodate any-old zip archive full of html/css/images just because the extension claims it's an .epub. It must actually BE an epub (or it will be forced to be). And being an epub means certain strictures have to apply.
		 | 
|   |   | 
|  03-19-2019, 12:52 PM | #11 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			And you can still unzip the epub and then use Sigil's Add Existing ... menu to bring in whatever files you want.
		 | 
|   |   | 
|  03-21-2019, 01:05 AM | #12 | |
| Guru            Posts: 681 Karma: 929286 Join Date: Apr 2014 Device: PW-3, iPad, Android phone | 
			
			If I open a good epub  in Sigil, delete the declaration of the image in the opf and save it, the image is still in the epub archive.  Then if I reopen it, there is no warning. The F7 check says "No problems found!" (even though there is still a page that references the image). -- epubcheck says : Quote: 
 This is perfectly correct, but a warning that something is amiss on opening would be helpful. You can unzip it and import the image or whatever; but you need to know the file was there. So the workaround is to do an epubcheck immediately on opening a new file, then exit without saving if you see warning "OPF-003". Last edited by AlanHK; 03-21-2019 at 01:10 AM. | |
|   |   | 
|  03-21-2019, 06:31 AM | #13 | |
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | Quote: 
 Sorry, but protecting a user from every single way they could possibly manage to break their own epub if they try hard enough is not a rabbit-hole I plan to start down. Neither is protecting a user from all the ways an externally created epub could already BE broken before Sigil ever touches it. Perhaps that will all change if we ever get to the point where sigil doesn't need to enforce its own internal structure on all epubs. but until that time ... it is what it is. Last edited by DiapDealer; 03-21-2019 at 08:05 AM. | |
|   |   | 
|  03-21-2019, 07:44 AM | #14 | |
| Banned            Posts: 168 Karma: 10010 Join Date: Oct 2018 Device: Tolino/PRS 650/Tablet | Quote: 
 | |
|   |   | 
|  03-21-2019, 09:07 AM | #15 | |
| Bookmaker & Cat Slave            Posts: 11,503 Karma: 158448243 Join Date: Apr 2010 Location: Phoenix, AZ Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2 | Quote: 
 Y'know, we get crap ePUBs like this, (web archives or the like) or I should say, inquiries about them, all the time, and IMHO, the probable best way to handle that is to follow Thasaidon's suggestion. Run it through Calibre, manifest the unmanifested items, and then you can edit in Sigil, if you wish. Gets you everything you need, and it's a de minimis additional effort. Given Sigil's intent, and what's trying to be done here, I don't see Kevin and Diap changing this anytime soon. Diap is right--protecting every single user from every single possible way that they can screw s**t up is the road to hell and bloat. Hitch | |
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Request for Ideas: Adding Indexed Files | mc3 | Library Management | 0 | 05-04-2012 06:18 PM | 
| Why is Kindle 3 REINDEXING books that were already indexed? | mypolar | Amazon Kindle | 18 | 11-26-2010 03:46 PM | 
| Are PDF files indexed in the Kindle 3? | pilluli | Amazon Kindle | 0 | 11-03-2010 02:27 PM | 
| [Format][SW] How to create an indexed, multimedia file | Warriah | Workshop | 3 | 07-01-2009 02:56 AM | 
| Indexed notepad | daudi | iRex | 29 | 02-26-2008 01:42 PM |