View Single Post
Old 08-18-2009, 05:06 PM   #13
Xenophon
curmudgeon
Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.
 
Xenophon's Avatar
 
Posts: 1,481
Karma: 5748190
Join Date: Jun 2006
Location: Redwood City, CA USA
Device: Kobo Aura HD, (ex)nook, (ex)PRS-700, (ex)PRS-500
Quote:
Originally Posted by Valloric View Post
It does. It most certainly does.

It just that Sigil is not the problem. The epub file is. The markup is... horrible, to say the least. For instance, every paragraph with text starts like this:
Code:
<p onmouseover="PNo(1032)">
And there is no javascript included in the book, from what I can tell. This only serves to slow down browsers. Also, between every two paragraphs with text there is code like this:
Code:
<p><a id="p1033" name="p1033"/></p>
Which does God knows what.

And I see what you mean when you said the text was centered. I thought you meant fully justified, but no, it is really centered. Removing the CSS style that applies "text-align: center" fixed this.

With that CSS style gone, and after removing all the useless "p/anchors" and the onmouseover handlers with some regexes in notepad++, your file now takes up 30MB less memory and can be nicely edited in Sigil with no lag. So CPU usage solved.

The memory consumption is still around 100MB, but your file is 80k XHTML lines-of-code. It is 1.4MB as an epub because epubs are compressed ZIP archives, and plain text gets compressed nicely. In Sigil, text is in uncompressed UTF-16 which means at least two bytes for every character, whereas your file is English in UTF-8, so only one byte per character stored. This effectively doubles the memory required to store your text in Sigil.

Now take into account that because of technical limitations of Qt widgets used for Code View and Book View, there are three text buffers instead of one, and that Book View is an embedded web browser... these things add up.

In the end it was your file that was causing the lag. Must have been an old version of calibre they were using when they created it, because that's some really painful markup.
I recognize those HTML parts! If you visit the Baen free library (for example) and read a book there on the Web -- that is, you view in your browser rather than downloading and reading locally -- you get html that contains these parts. What they appear to be is support for identifying paragraphs by number (those are paragraphs #1032 and #1033 in your quoted examples above). That was useful stuff when users were reporting typos and other bugs to the publisher from the Web, but serve no purpose in a downloaded eBook.

Kovid built the -Baen preprocessing switch for Calibre exactly to strip that stuff out (at my request). May I suggest that Sigil provide the same capability? It's a straight-forward bit of sed script hacking...

Xenophon

P.S. I'll poke Baen's web guy about removing that cruft from his eBook versions.
Xenophon is offline   Reply With Quote