Application slowness after 1000+ chapters

Dire_Storm · 12-19-2021, 12:31 AM

I'm making an epub for a novel with a lot of updates (it's on chapter 1500 or something). The other features in Sigil work just fine, however after getting to chapter 1000 it takes longer and longer to "Add Blank HTML File" or "Add Copy".

Is there a way to combine the chapters into a single HTML file while still being able to list each chapter in the table of contents? Or is there another way to reduce the slowness of adding an HTML file?

The_book · 12-19-2021, 03:38 AM

combine the chapters into a single HTML file will also be slow.
There is one way:
Edit chapters in a single file in vscode or other text editer. Add break markers to this file
Import this file to sigil.(this will be a little slow)
Click F6(split at markers). Wait until everything is over.
Now you have all the chapters.

KevinH · 12-19-2021, 07:55 AM

Forgive me but what single book needs over 1500 chapters? Perhaps you would be better served splitting this into multiple epubs (volume 1, volume 2) than trying to shoehorn so many "chapters" into one epub.

Tex2002ans · 12-19-2021, 06:23 PM

Quote:

Originally Posted by Dire_Storm

Is there a way to combine the chapters into a single HTML file while still being able to list each chapter in the table of contents?

Yes, just select them in the Book Browser (the left-hand column with all the files listed), then Right-Click > Merge.

If the files have some sort of categories (like Parts, Years, etc.), try to combine them into logical points. That may make more sense.

If not, then do something like:

Chapters0001-0010.xhtml
Chapters0011-0020.xhtml
[...]
Chapters1000-1010.xhtml

That would cut your files down by 10.

Side Note: Another thing to keep in mind is the content.opf + toc.ncx. When dealing with 1000+ separate files, those 2 files can grow extremely large. If they reach the tipping point of >300 KBs, this may make the ebook completely unreadable on older devices.

Quote:

Originally Posted by Dire_Storm

Or is there another way to reduce the slowness of adding an HTML file?

You could swap over to Calibre's Editor. It seems to handle very large # of HTML files fine.

Besides that, you can always just work on your HTML externally, then import them into Sigil.

Or, like KevinH said, just split the thing into 2 or more volumes.

Note: I did test Add Blank HTML File in Sigil + the attached EPUB... and while there is a noticeable delay before the file appears, it's ~2 seconds.

How slow is yours going?

Quote:

Originally Posted by KevinH

Forgive me but what single book needs over 1500 chapters? Perhaps you would be better served splitting this into multiple epubs (volume 1, volume 2) than trying to shoehorn so many "chapters" into one epub.

I've worked on compilations with tons of material (like 20 years of weekly newspaper articles by an author).

Although those were combined into one HTML file "per year", I could see how you could also have one "per article".

I'll attach a test EPUB.

Maybe there's something that can be done in speeding up these extreme cases, similar to when you sped up Reports + Spellcheck.

Attached File Note: All I did was take the original ebook:

https://mises.org/library/business-t...-henry-hazlitt

and ran:

Search: <h1 class="cht"
Replace: <hr class="sigil_split_marker" /><h1 class="cht"

+ Split at Markers (F6).

This split all ~1000 articles into their respective HTML files.

KevinH · 12-19-2021, 06:44 PM

I will add this to my list of things to look at but in all honesty I will make this very very low on my list of priorities as I have yet to see a printed book with 1500 chapters all in one volume. Small chapters or not. Nor have I ever found a reader that would support a toc for it.

The only thing I have seen that comes close is graphic novel with one page per "chapter" but the size of the image files themselves were the problem.

Sounds like a small database app would be better suited.

Tex2002ans · 12-20-2021, 03:47 PM

Quote:

Originally Posted by KevinH

I will add this to my list of things to look at but in all honesty I will make this very very low on my list of priorities [...]

Yep, definitely a very low priority issue.

Unless there are some severe slowdowns.

Perhaps my test case (~2 second delay) isn't what Dire_Storm is seeing.

Note: I just retested the steps in Calibre Editor, and adding an XHTML file was near-instant.

Maybe it's a case of O(n^2) or O(n^3) sneaking in? (At small-scale # of files, it doesn't matter much, but at large # of files, the "add new file" delay gets dramatically larger.)

Where's Sigil spending that time exactly? Is it a Book Browser update thing? Reading/appending the content.opf?

Quote:

Originally Posted by KevinH

[...] as I have yet to see a printed book with 1500 chapters all in one volume. Small chapters or not. Nor have I ever found a reader that would support a toc for it.

I think the word "chapter" is throwing things off... I think similar would apply to journals (or newspaper-/magazine-like collections of articles).

Remember the behemoth I gave you to test Reports+Spellchecking slowdowns?

Similarly, it was a monthly periodical that ran for 15 years (~1.3 million words, ~900 articles).

Original EPUB can be grabbed here:

https://mises.org/library/complete-l...orum-1969-1984

And attached is a rough HTML-per-article version I created.

* * *

OPF + TOC Filesize Warnings (Side Note)

Like I said earlier... you typically have to pay attention to these key files. You don't want them getting too large, or the ebook may become completely unopenable on certain devices.

In the above EPUB, articles were sensibly combined into logical chunks (by Volume/Issue):

18 Volumes / 164 Issues
4 MB = EPUB
133.27 KB = content.opf
234.50 KB = toc.ncx

If you split each individual article:

~900 articles
4.4 MB = EPUB
- ~400 KBs added.
237.46 KB = content.opf
- ~ doubled.
238.86 KB = toc.ncx

If you had more descriptive filenames, like actual article titles, the overhead would be much larger.

Trick: Simplifying Filenames

If you're at/near that ~300 KB tipping point, another trick you can do is simplify your filenames.

I tend to like very human-readable (and easily sortable) filenames:

Chapter.01.-.Title.Goes.Here.xhtml
pg123.-.Figure.01.-.Name.of.Image.png
2021.01.01.-.Last,First.-.Full.Article.Title.Goes.Here.xhtml
Vol.01.Iss.01.-.Art.99.-.Full.Article.Title.Goes.Here.xhtml

But you may want to go with much simpler:

Chapter01-01.xhtml
Chapter01.xhtml
Chap01.xhtml
Ch01.xhtml
pg123.-.Figure01.png
Figure01.png
Fig01.png
2021.01.01.-.Article.Title.xhtml
2021.01.01.xhtml
Vol.01.Issue.01.-.99.xhtml
Vol.01-01-99.xhtml
01-01-99.xhtml

Depending on how many files/links you have throughout, this can cut the filesize down dramatically:

Vol.01.Iss.01.-.Art.99.-.Full.Article.Title.Goes.Here.xhtml
Vol.01-01-99.xhtml

59 vs. 18 characters.

(Same trick if you have indexes... with thousands and thousands of internal links to pages. A simple change of filenames can cut the index's filesize down by more than half.)

The_book · 12-21-2021, 05:20 AM

Quote:

Originally Posted by Tex2002ans

Another thing to keep in mind is the content.opf + toc.ncx. When dealing with 1000+ separate files, those 2 files can grow extremely large. If they reach the tipping point of >300 KBs, this may make the ebook completely unreadable on older devices.

Yes. I remember once I someone ask to change a chm file to epub file. I give a try, but that file is a dictionary with 160000 files. opf file will have 23mb in size, make that converted epub file conpletely useless

DNSB · 12-21-2021, 04:02 PM

I tried @Tex2020ans' epub and it took about 2 seconds to add a blank html file though that epub has only 882 text files.

sherman · 12-29-2021, 08:14 PM

Note, based on the OP's description, they are probably compiling a web serial. I think long running (eastern) web serials get that large.

KevinH · 12-29-2021, 08:26 PM

Quote:

Originally Posted by sherman

Note, based on the OP's description, they are probably compiling a web serial. I think long running (eastern) web serials get that large.

Even so, why create new files from blank by hand in Sigil instead of creating them all in a folder and using Add Existing to load a bunch of them at once. Even serials need separate volumes at some point.

Again, 2 seconds is not something that I worry about as most filesystems in e-readers would have issues with that many files. Simply create separate volumes as most separate printed serials do.

Tex2002ans · 12-29-2021, 09:55 PM

Quote:

Originally Posted by KevinH

Again, 2 seconds is not something that I worry about as most filesystems in e-readers would have issues with that many files. Simply create separate volumes as most separate printed serials do.

Ebooks don't have to abide by the laws of the physical.

For example, there are multi-volume works that continue page numbers.

Pages 1-500 in Volume I.

Pages 501-1000 in Volume II.

Physical books need to sometimes be split like that because the binding itself becomes too thick/unwieldy. Ebooks don't have that problem.

It is pretty nice when you have the entire thing combined into one single ebook.

Like I said, it's pretty rare (I've only run across a handful in all my conversions), but it would be nice if Sigil handled huge # of files a little faster/better.

(I'll have to dig out that 6-volume Thomas Jefferson book. I think that's one of the few I ran across that had a multi-volume index, referencing page #s throughout the entire thing. Having all 6 volumes, in a single ebook, would be infinitely better than 3000+ pages of tomes.)

DiapDealer · 12-30-2021, 10:46 AM

Quote:

Originally Posted by Tex2002ans

It is pretty nice when you have the entire thing combined into one single ebook.

Not a sentiment I share at all. The lack of restriction in what is possible with ebooks but not physical books should never be confused with being "a good idea to do in practice".

E or P: there's a point where size becomes detrimental to usage from a point of practicality.

theducks · 12-30-2021, 11:02 AM

Personally, the only justification for a huge, monolithic book is if it references other sections (Volumes in dead tree) because of the way most readers handle internal book files vs other books. With Dead tree, I can have volumes spread out on my table, all open to a specific point within.

To me, Huge means:Sluggish (if it loads at all)

KevinH · 12-30-2021, 01:08 PM

I have run some tests.

If you have selected the "Text" folder in BookBrowser before using Add Blank HTML it just appends the file to the end which requires parsing the complete manifest and spine order of the OPF once to create the new opf.

If you instead have any other xhtml file selected in BookBrowser and use Add Blank HTML it will needs to create the file, update manifest and spine of the opf to add the file, and then move it to after the selected xhtml file and that ends up parsing the opf three times more - once to get the spine order up to the selected file, once to get the spone order after the selected file, and once to move and recreate the spine. Note: both the manifest and spine will have over 1000 entries to parse in each case.

All parsing of the OPF is done via EmbeddedPython calls. I could rewrite all of that code in Cpp for more speed but in all honesty the rebuilding of the BookBrowser icons (the order is important as it represents spine order) and all of this takes less than 2 seconds even with over 1000 chapter files and that time grows linearly with size as long as you just want to add the blank file to the end (because you have Selected the Text folder in BookBrowser before adding anything).

So I simply do not see a dire need to rewrite the entire opf parser into Cpp just to cut this time down below 2 seconds. The thought of the bugs that would be introduced by the rewrite of long running and debugged python code is just not worth it.

So I will add rewriting the current debugged new_opfparser.py into Cpp to my LONGTERM to-do list but as I have repeatedly said - it is a very very low priority. Web serials are meant to be read on the web with none of the epub structure and overhead. Imagine trying to repaginate in Word over 1000 chapters (or even load all of the chapters) and you can see that Sigil is doing a reasonable job as is.

KevinH · 01-02-2022, 03:24 PM

I have made some modifications that should help speed up inserting a Blank HTML file specially if inserted in the middle of other xhtml which took the longest.

12-19-2021, 12:31 AM	#1
Dire_Storm Junior Member Posts: 1 Karma: 10 Join Date: Dec 2021 Device: iPad	Application slowness after 1000+ chapters I'm making an epub for a novel with a lot of updates (it's on chapter 1500 or something). The other features in Sigil work just fine, however after getting to chapter 1000 it takes longer and longer to "Add Blank HTML File" or "Add Copy". Is there a way to combine the chapters into a single HTML file while still being able to list each chapter in the table of contents? Or is there another way to reduce the slowness of adding an HTML file?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
calibredb slowness: is it expected?	ZioNemo	Library Management	5	02-06-2014 07:36 PM
Calibre slowness on Webdav	madeira	Calibre	4	12-19-2012 06:05 AM
Slowness of e-book reader	mst	Calibre	5	12-19-2010 06:11 PM
Overall slowness	iharley	Calibre	2	07-06-2010 12:29 AM
Buy a 1000 Base and turn it into a 1000 S?	doctorow	iRex	5	09-24-2008 02:14 AM

12-19-2021, 03:38 AM	#2
The_book Zealot Posts: 100 Karma: 10 Join Date: Aug 2019 Device: none	combine the chapters into a single HTML file will also be slow. There is one way: Edit chapters in a single file in vscode or other text editer. Add break markers to this file Import this file to sigil.(this will be a little slow) Click F6(split at markers). Wait until everything is over. Now you have all the chapters.

12-19-2021, 07:55 AM	#3
KevinH Sigil Developer Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many	Forgive me but what single book needs over 1500 chapters? Perhaps you would be better served splitting this into multiple epubs (volume 1, volume 2) than trying to shoehorn so many "chapters" into one epub.

12-19-2021, 06:44 PM	#5
KevinH Sigil Developer Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many	I will add this to my list of things to look at but in all honesty I will make this very very low on my list of priorities as I have yet to see a printed book with 1500 chapters all in one volume. Small chapters or not. Nor have I ever found a reader that would support a toc for it. The only thing I have seen that comes close is graphic novel with one page per "chapter" but the size of the image files themselves were the problem. Sounds like a small database app would be better suited.

12-21-2021, 04:02 PM	#8
DNSB Bibliophagist Posts: 48,004 Karma: 174315100 Join Date: Jul 2010 Location: Vancouver Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos	I tried @Tex2020ans' epub and it took about 2 seconds to add a blank html file though that epub has only 882 text files.

12-29-2021, 08:14 PM	#9
sherman Guru Posts: 877 Karma: 2676800 Join Date: Aug 2008 Location: Taranaki - NZ Device: Kobo Aura H2O, Kobo Forma	Note, based on the OP's description, they are probably compiling a web serial. I think long running (eastern) web serials get that large.

12-30-2021, 11:02 AM	#13
theducks Well trained by Cats Posts: 31,241 Karma: 61360164 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	Personally, the only justification for a huge, monolithic book is if it references other sections (Volumes in dead tree) because of the way most readers handle internal book files vs other books. With Dead tree, I can have volumes spread out on my table, all open to a specific point within. To me, Huge means:Sluggish (if it loads at all)

12-30-2021, 01:08 PM	#14
KevinH Sigil Developer Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many	I have run some tests. If you have selected the "Text" folder in BookBrowser before using Add Blank HTML it just appends the file to the end which requires parsing the complete manifest and spine order of the OPF once to create the new opf. If you instead have any other xhtml file selected in BookBrowser and use Add Blank HTML it will needs to create the file, update manifest and spine of the opf to add the file, and then move it to after the selected xhtml file and that ends up parsing the opf three times more - once to get the spine order up to the selected file, once to get the spone order after the selected file, and once to move and recreate the spine. Note: both the manifest and spine will have over 1000 entries to parse in each case. All parsing of the OPF is done via EmbeddedPython calls. I could rewrite all of that code in Cpp for more speed but in all honesty the rebuilding of the BookBrowser icons (the order is important as it represents spine order) and all of this takes less than 2 seconds even with over 1000 chapter files and that time grows linearly with size as long as you just want to add the blank file to the end (because you have Selected the Text folder in BookBrowser before adding anything). So I simply do not see a dire need to rewrite the entire opf parser into Cpp just to cut this time down below 2 seconds. The thought of the bugs that would be introduced by the rewrite of long running and debugged python code is just not worth it. So I will add rewriting the current debugged new_opfparser.py into Cpp to my LONGTERM to-do list but as I have repeatedly said - it is a very very low priority. Web serials are meant to be read on the web with none of the epub structure and overhead. Imagine trying to repaginate in Word over 1000 chapters (or even load all of the chapters) and you can see that Sigil is doing a reasonable job as is.

01-02-2022, 03:24 PM	#15
KevinH Sigil Developer Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many	I have made some modifications that should help speed up inserting a Blank HTML file specially if inserted in the middle of other xhtml which took the longest.