MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   limit on size of html file? (https://www.mobileread.com/forums/showthread.php?t=334472)

hobnail 10-30-2020 04:52 PM

limit on size of html file?
 
Is there a limit on how big an html file can be? When working on a Project Gutenberg book I first merge all of the files together, excluding the first one, the cover page, and then split them on the chapter (h2) tags. Before splitting, the merged file was 1.5 megabytes and sigil would only display the cover page in the Preview window.

The book was the EPUB (no images) from here: https://www.gutenberg.org/ebooks/23646

KevinH 10-30-2020 06:06 PM

Did you attempt to scroll down? Did you try clicking in CodeView to sync Preview.
Yes, extremely large single files will cause Sigil to slow down. Try turning off Preview.

Tex2002ans 10-30-2020 06:08 PM

Quote:

Originally Posted by hobnail (Post 4052893)
Is there a limit on how big an html file can be?

In EPUB, there's a "soft limit" of ~300 KB per HTML file. Very old e-ink devices had very limited RAM and wouldn't be able to read/open those.

It's also good practice to try to split each chapter into its own HTML file. This:
  • allows each chapter to "start on a new screen".
  • allows easy/fast editing.
  • helps organize files within the EPUB.
    • "Chapter01.xhtml" + "Chapter99.xhtml" is easier to maintain compared to one enormous "book.xhtml" file.
  • allows you to take advantage of various tools/reports.
    • Such as Sigil's Tools > Reports > HTML Files, which can give you exact word counts per file.

Quote:

Originally Posted by hobnail (Post 4052893)
Before splitting, the merged file was 1.5 megabytes and sigil would only display the cover page in the Preview window.

I merged them together and it works fine for me. :shrug:

What version of Sigil are you using + what were the exact steps?

KevinH 10-30-2020 06:09 PM

Some older e-readers had a limit of 320K or so. So anything much bigger than that should probably be broken up so that it will work with all epub readers.

KevinH 10-30-2020 06:14 PM

I will try testing with it tonight and report back if I see any issues on macOS

PS: I see Tex2002ans beat me to it!

hobnail 10-30-2020 06:26 PM

Quote:

Originally Posted by Tex2002ans (Post 4052921)
I merged them together and it works fine for me. :shrug:

What version of Sigil are you using + what were the exact steps?

I downloaded the book. Opened it in Sigil. Click on the 2nd file, the one after wrap0000.html and shift click on the last file, hit ctl+m to merge them. The merged file is open in the editor and I can do the usual stuff there. But the Preview window is stuck on the cover page. I don't need the Preview window at this time, but it's handy to see what's going on before I split the file at the chapter tags because I tend to do some cleanup before I split it, but I could just as easily do that after splitting it. After splitting it then the Preview window works as expected. Sigil 1.3.0.

KevinH 10-30-2020 06:46 PM

Yes, I can recreate this. The problem is Qt is refusing to load the page in Preview in the time allotted. In other words it just times out.

I will see if there is a way to make Qt not time out.

Thanks for the bug report!

KevinH 10-30-2020 07:46 PM

Well this is actually a limitation in QtWebEngine/Chromium. They encode each xhtml into a URL data style url and that after encoding is limited to 2mb.

After that it will simply refuse to load. You are hitting that limitation. Qt is aware of the bug but refuses to fix it as it is upstream (Chromium).

See https://bugreports.qt.io/browse/QTBUG-53414 for example.

After our next release I will look into working around this using file urls and view->load().

KevinH 10-30-2020 11:39 PM

For the record and to remind myself, the workaround for this limitation is to install a custom "sigil" url scheme handler.

See this link ... https://stackoverflow.com/questions/...of-2mb-content

DiapDealer 10-31-2020 01:02 PM

Quote:

Originally Posted by KevinH (Post 4053050)
For the record and to remind myself, the workaround for this limitation is to install a custom "sigil" url scheme handler.

See this link ... https://stackoverflow.com/questions/...of-2mb-content

Didn't we go with the url interceptor approach to avoid having to do the custom url scheme handler a little while back? That change itself was to work around a Qt change in behavior with regard to urls to local files, if I recall.

KevinH 10-31-2020 02:11 PM

Yes, and we could have used either approach to fix the blocking of file:: urls, and I thought both approaches has plusses and minuses and so were about equal. So we decided to go with the url interceptor route.

But we did not know about the 2mb single html limit then. Given this limitation, we should have gone with the url scheme handler approach as it has the added benefit of working around the size limitation. QtWebkit did not have this issue.

So after this next release, I will move things around to use the url scheme handler approach.

Luckily this limitation is not that important as it only impacts Preview, and only limits the total size of a single html file, and does not count the the resources like images, fonts, video, audio where 2 mb might be too strict and of course each xhtml gets you up to an additional 2mb.

So it should not really impact any well designed epubs.

DiapDealer 10-31-2020 06:16 PM

Sounds good. :thumbsup:

KevinH 11-02-2020 11:06 AM

Actually, this is going to be a real pain in the ass. Right now whenever we Preview a page we preprocess the page contents to handle dark and light mode, user custom css, inject mathjax, etc all without actually changing the xhtml file on disk (and we do not want to change it on disk!).

So any url scheme handler would have to see the request is to load a xhtml file and instead of reading it in from disk somehow look it up in some global hash table storage to get the pre-processed version of the data and reply with that.

This will not be an easy change as the URLScheme handler does not keep the state needed to do the preprocessing itself. So somehow the schemehandler must be get the pre-processed version of the file.

Argh!

DiapDealer 11-02-2020 11:33 AM

Quote:

Originally Posted by KevinH (Post 4053815)
Actually, this is going to be a real pain in the ass.

I remember the highly invasive PITA quotient being a big reason for avoiding the custom url-scheme-handler approach originally.

Feel free to put it on the back-burner if you want. A highly invasive overhaul that could likely introduce more higher-profile bugs might not be worth the trouble.

KevinH 11-02-2020 11:46 AM

Especially when that limitation is a compile time constant in QtWebEngine-Chromium and it could be easily changed:

see GetMaxURLChars in https://github.com/qt/qtwebengine-ch...n/url_utils.cc

Hmm... it is worth a shot to try to use urlschemehandlers but sometime after the new year as I really do not see a strong need for this in real world epubs. Certainly any epub that had a single xhtml file over 2mb would break almost all older and many current epub reading devices.


All times are GMT -4. The time now is 10:52 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.