A few things:
There is *never* a need to Restructure an epub to Sigil norm. Sigil norm is just an epub layout that old Sigil FORCED your epub into just to open it. It is no better than any other adhoc epub layout. And restructuring an epub to an arbitrarily chosen layout need never be done more than once (if at all).
Restructuring does not use cryptograohic hashes. Restructuring does in fact move actual files in Sigil's temp (having underlying actual files is needed for things like listening for external changes to those files via open with) even though caches of file contents for quicker editing are used. Moving then needs to fully gumbo parse *all* xhtml files to fix and update all relative links everyplace as well as update all css urls and file references and update the nav/toc. The OPF must also be fully parsed and updated.
Being an editor means that file hash tables and advance indexing are worthless, as the only thing you can index in a file is file position and that becomes worthless after a single character change.
So worker threads are used where ever possible to concurrently parse all xhtml files in the fly to find ids, etc. only when needed.
I still find the thought of 1200 chapter files being truly needed for even any textbook to be absurd. 1200 pages maybe but 1200 chapters just means someone has not thought through the best organization and structure.
I will look into storing zip mod dates by file contents hash and not file path.
Quote:
Originally Posted by democrite
Pretty pretty please, I hope that is possible. I would otherwise need to be conscious that first I would restructure to Sigil norm before any other changes. That is not so bad but if it isn't too much effort, other would be great.
I haven't had a chance to look much at the code yet recently I've been thinking I will start more. I do not know if other file metadata is somehow kept track of. Perhaps such would help for possible future features. I recall, though I do not know if such has changed, that when adding links to IDs (I think it was) that there was some delay for larger EPUBs. In such cases, I have a fair amount of files with thousands or tens of thousands of IDs, common with academic works from certain publishers. Maybe the indexing of Sigil could grow over time such that more becomes possible. I was thinking of navigation features such as goto (id, text content, or title). Perhaps you are used to such in whatever editor or IDE you use; such would too be nice someday for Sigil.
When you say restructuring moves files, I'm not sure if that means such is really what happens in the temp folder. If so, I do not know if that is necessary over moving the file. Would such be less efficient? In my case of APFS, disk usage and battery life wouldn't be affected though unsure about NTFS, ext4, or other Linux file systems.
Of some informal testing of speed of opening an EPUB, with one with ~6000 images, 13 seconds vs 8.
edit: I'd guess restructuring to Sigil norm also recalculates checksums. Of a somewhat medium sized file, 1200 XHTML and ~250 images, restructuring took a while. I didn't time it but it seemed like half a minute or more. You may want to make such a preference. A faster hashing algorithm too might well be worth investigating if possible.
|