Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 03-18-2019, 04:21 AM   #1
un_pogaz
Chalut o/
un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.
 
un_pogaz's Avatar
 
Posts: 486
Karma: 678910
Join Date: Dec 2017
Device: Kobo
Exclamation Remove the non-indexed elements?!

When opening an ePub, Sigil only loads the indexed files into the <manifest>, and will delete any non-indexed files.
I understand this behavior which aims to remove parasitic files added by other unscrupulous and indelicate software.

However, more than once I have opened an ePub and I have only HTML pages and no images or CSS sheets.
However, its files are present in the ePub, simply they are not indexed in the <manifest> (probably because of unscrupulous software that confuses <manifest> and <spine>)

What I reproach Sigil for is the unilateral decision to delete non-indexed files, without warning or consultation of the user, because the risk of completely breaking an ePub is important.
It would be nice if, when opening, Sigil checked the correspondence <manifest>/ "content of the ZIP archive" and asked the user if he wanted to add them to the <manifest> in case of an non-indexed file as found.
The must would be that we are a check-list of files that we want to keep or not.
un_pogaz is offline   Reply With Quote
Old 03-18-2019, 06:18 AM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,848
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Sigil's every action is predicated on the notion that the opf is entirely correct. And every action Sigil undertakes ensures that the opf stays that way. Sigil can't warn you about unmanifested xhtml files when it opens an epub, because it doesn't know they exist. It's not purposely deleting them. It hasn't made a "unilateral decision." It just hasn't loaded them. Because the opf (the boss) didn't tell it to do so when it was being parsed. What you're asking for would take a complete overhaul of how Sigil opens epubs, or a complete overhaul in how they're saved. Neither would be a trivial undertaking.

Last edited by DiapDealer; 03-18-2019 at 06:21 AM.
DiapDealer is offline   Reply With Quote
Advert
Old 03-18-2019, 09:17 AM   #3
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,069
Karma: 6361556
Join Date: Nov 2009
Device: many
Sounds like you should create an input plugin that walks the contents of the epubs zip file and adds any files present but not manifested that are css, image, font, or xhtml files to the manifest, and the xhtml files to the end of the spine in a some pattern (but what order?). You might want to include unmanifested javascript files as well.

It appears that these bad "epubs" are nothing more than a zip of a website that has been scraped.

Alternatively, unzip these "books" first and then use Sigil's Add Existing ... menu to pull in the pieces you want.
KevinH is offline   Reply With Quote
Old 03-18-2019, 10:00 AM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,069
Karma: 6361556
Join Date: Nov 2009
Device: many
Also since a spine is made up from manifest ids, any file not manifested can not be in the anyway. So no spine order would be relevant either.

That is not even close to being an epub. That is just a zip archive.
KevinH is offline   Reply With Quote
Old 03-18-2019, 07:33 PM   #5
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,848
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Someone is clearly changing the extension of an .htmlz archive to .epub in my opinion. The metadata.opf file they often contain is woefully insufficient for the purposes of epub editing.
DiapDealer is offline   Reply With Quote
Advert
Old 03-19-2019, 12:17 AM   #6
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 47,944
Karma: 174315098
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by un_pogaz View Post
What I reproach Sigil for is the unilateral decision to delete non-indexed files, without warning or consultation of the user, because the risk of completely breaking an ePub is important.
If the files in the archive don't match up to the files in content.opf, the epub is already completely broken so it's a bit late to talk about the risk of completely breaking an epub. Very rarely—if ever—does Garbage In produce Gospel Out.

A couple of quotes:

From the epub3 standard:
All Publication Resources must be referenced from the manifest, regardless of whether they are included in the EPUB Container or made available remotely.

and from Mukli Krisztián's epub boilerplate:

The content.opf file is the most important part of the EPUB package, because it defines the structure of the eBook and the metadata.

Manifest Section – This section is a list of all the content files, media, fonts, and stylesheets used in the eBook. The files can be listed in any order. However, you should not include a file in the Manifest Section that is not in the EPUB package. Also, you should not have undeclared files in the EPUB package that have not been declared in the Manifest Section.
[/QUOTE]
DNSB is offline   Reply With Quote
Old 03-19-2019, 04:28 AM   #7
un_pogaz
Chalut o/
un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.
 
un_pogaz's Avatar
 
Posts: 486
Karma: 678910
Join Date: Dec 2017
Device: Kobo
I agree that, theoretically, an unindexed file in the <manifest> "does not exist" in the ePub, and that therefore technically the ePub is broken.
But not really either because the files are well used (in the <link> and <img>), but their non-importation breaks the ePub even more.
Problems that could have been fixed become unfixable.

Is a conversion problem, possible. But the idea is that not everyone is also respectful of the standard, so a little caution will be good.
The most vicious thing is that sometimes some files are indexed, others are not.

I am not asking for a systematic addition but just a check, which can be done using the "file table" of the ZIP archive. And no need to decompress any files, just load the OPF in memory and read the entries of the <manifest> (if Sigil was in C#, I could tried it).

Then Sigil works normally, and creates a conforme ePubs.

If it proves too complex to implement, okay, very sad, but I just want to reported this problem and ask for some thought to solve it.

To answer KevinH: So you ask me to 1) open with a "ZIP opener" each ePub I want to work 2) looked in the OPF the correspondence <manifest> contained in the ZIP 3) added the entries "forgotten" ? By hand?
This is possible, but the goal of a software is it to automate tedious, repetitive and sometimes complex tasks? Might as well take advantage of it.

Last edited by un_pogaz; 03-19-2019 at 04:32 AM.
un_pogaz is offline   Reply With Quote
Old 03-19-2019, 05:22 AM   #8
Thasaidon
Hedge Wizard
Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.Thasaidon ought to be getting tired of karma fortunes by now.
 
Thasaidon's Avatar
 
Posts: 802
Karma: 19999999
Join Date: May 2011
Location: UK/Philippines
Device: Kobo Touch, Nook Simple
Have a look at the "Modify ePub" plugin for Calibre. This plugin can perform certain jobs on ePubs including "Add unmanifested file to Manifest".

If you run the "Modify ePub" plugin on your ePub before opening it in Sigil it may very prevent the problem you are having.
Thasaidon is offline   Reply With Quote
Old 03-19-2019, 07:00 AM   #9
un_pogaz
Chalut o/
un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.un_pogaz ought to be getting tired of karma fortunes by now.
 
un_pogaz's Avatar
 
Posts: 486
Karma: 678910
Join Date: Dec 2017
Device: Kobo
Thank, I'd look at that.
(But integrating this security feature would always be a plus)
un_pogaz is offline   Reply With Quote
Old 03-19-2019, 08:53 AM   #10
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,848
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
What you're trying to open is not technically even an epub. It's just been given an .epub extension. The opf file in that kind of archive is for metadata purposes only. Sigil's not going to accommodate any-old zip archive full of html/css/images just because the extension claims it's an .epub. It must actually BE an epub (or it will be forced to be). And being an epub means certain strictures have to apply.
DiapDealer is offline   Reply With Quote
Old 03-19-2019, 12:52 PM   #11
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,069
Karma: 6361556
Join Date: Nov 2009
Device: many
And you can still unzip the epub and then use Sigil's Add Existing ... menu to bring in whatever files you want.
KevinH is offline   Reply With Quote
Old 03-21-2019, 01:05 AM   #12
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 681
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
If I open a good epub in Sigil, delete the declaration of the image in the opf and save it, the image is still in the epub archive.

Then if I reopen it, there is no warning. The F7 check says "No problems found!" (even though there is still a page that references the image).

-- epubcheck says :

Quote:
WARNING(OPF-003): Item 'OEBPS/Images/img001.jpg' exists in the EPUB, but is not declared in the OPF manifest.
ERROR(RSC-008): Referenced resource is not declared in the OPF manifest.
If I then make any random change (clean HTML, e.g.), save, the new file has lost the image.

This is perfectly correct, but a warning that something is amiss on opening would be helpful. You can unzip it and import the image or whatever; but you need to know the file was there.

So the workaround is to do an epubcheck immediately on opening a new file, then exit without saving if you see warning "OPF-003".

Last edited by AlanHK; 03-21-2019 at 01:10 AM.
AlanHK is offline   Reply With Quote
Old 03-21-2019, 06:31 AM   #13
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,848
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by AlanHK View Post
If I open a good epub in Sigil, delete the declaration of the image in the opf and save it, the image is still in the epub archive.
The question is: why would you do something like that? Anyone knowledgeable enough to risk manually editing the opf should already know that they also need to delete the image-file itself for the removal to be complete. Those NOT knowledgeable enough (or those who are just plain lazy like me), would use Sigil's integral delete function which would take care of both details properly.

Sorry, but protecting a user from every single way they could possibly manage to break their own epub if they try hard enough is not a rabbit-hole I plan to start down. Neither is protecting a user from all the ways an externally created epub could already BE broken before Sigil ever touches it.

Perhaps that will all change if we ever get to the point where sigil doesn't need to enforce its own internal structure on all epubs. but until that time ... it is what it is.

Last edited by DiapDealer; 03-21-2019 at 08:05 AM.
DiapDealer is offline   Reply With Quote
Old 03-21-2019, 07:44 AM   #14
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
Quote:
Originally Posted by un_pogaz View Post

Is a conversion problem, possible. But the idea is that not everyone is also respectful of the standard, so a little caution will be good.
The most vicious thing is that sometimes some files are indexed, others are not.

I am not asking for a systematic addition but just a check, which can be done using the "file table" of the ZIP archive. And no need to decompress any files, just load the OPF in memory and read the entries of the <manifest> (if Sigil was in C#, I could tried it).
Nice idea for an input plugin.
Vroni is offline   Reply With Quote
Old 03-21-2019, 09:07 AM   #15
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by DiapDealer View Post
The question is: why would you do something like that? Anyone knowledgeable enough to risk manually editing the opf should already know that they also need to delete the image-file itself for the removal to be complete. Those NOT knowledgeable enough (or those who are just plain lazy like me), would use Sigil's integral delete function which would take care of both details properly.

Sorry, but protecting a user from every single way they could possibly manage to break their own epub if they try hard enough is not a rabbit-hole I plan to start down. Neither is protecting a user from all the ways an externally created epub could already BE broken before Sigil ever touches it.

Perhaps that will all change if we ever get to the point where sigil doesn't need to enforce its own internal structure on all epubs. but until that time ... it is what it is.

Y'know, we get crap ePUBs like this, (web archives or the like) or I should say, inquiries about them, all the time, and IMHO, the probable best way to handle that is to follow Thasaidon's suggestion.

Run it through Calibre, manifest the unmanifested items, and then you can edit in Sigil, if you wish. Gets you everything you need, and it's a de minimis additional effort.

Given Sigil's intent, and what's trying to be done here, I don't see Kevin and Diap changing this anytime soon. Diap is right--protecting every single user from every single possible way that they can screw s**t up is the road to hell and bloat.

Hitch
Hitch is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Request for Ideas: Adding Indexed Files mc3 Library Management 0 05-04-2012 06:18 PM
Why is Kindle 3 REINDEXING books that were already indexed? mypolar Amazon Kindle 18 11-26-2010 03:46 PM
Are PDF files indexed in the Kindle 3? pilluli Amazon Kindle 0 11-03-2010 02:27 PM
[Format][SW] How to create an indexed, multimedia file Warriah Workshop 3 07-01-2009 02:56 AM
Indexed notepad daudi iRex 29 02-26-2008 01:42 PM


All times are GMT -4. The time now is 05:05 AM.


MobileRead.com is a privately owned, operated and funded community.