MobileRead Forums - View Single Post

llasram · 01-12-2009, 09:19 AM

Quote:

Originally Posted by nrapallo

Oh, and just so as to be clear (clearer?), I don't view Kovid or Calibre as the culprit in imposing this limitation.

For my two cents -- as much as I like EPUB, I think the real culprit is the Open Container Format (OCF). The OCF basically boils down to "put all the book content in a ZIP file," which is delightfully simple, may be just a bit too simple.

Every other e-book format that I know the details of contains features which explicitly simplify seeking and incremental rendering. LIT compresses book content in 64k chunks, allowing random-access to compressed data; it contains an index of all explicit page-breaks within the book content; and it uses a simplified CSS rendering model without context selectors, allowing accurate rendering of an element given only its parents. Mobipocket compresses all ebook content in 4k chunks; and it uses a single-level rendering model, allowing rendering with almost no context.

In contrast, EPUB: (a) only allows compression on entire file streams; (b) contains no indices aiding incremental rendering; and (c) mandates a rendering model which requires full file context.

For example, EPUB allows a file which looks like:

Code:

<html>
  <head>
    <title>Example</title>
    <style>.first ~ .last { display: none; }</style>
  </head>
  <body>
    <p class="first">Displayed!</p>
    [50 MB of content]
    <p class="last"> Not displayed</p>
  </body>
</html>

And requires that fully conformant systems render the content correctly, not displaying the final pagraph. Which in the context of the OCF means keeping 50MB of parsed markup around and in memory. Which is even worse on a device like the Sony Reader. On a system with a hard drive, the reader system could at least extract the compressed data to temporary files and build its indices from there. The Reader's flash filesystem is only good for so many writes, so instead it has to keep all the extracted file content in RAM, right along with all the RAM it need to actually parse the data and render it.

So unfortunately EPUB / the OCF needs some sort of arbitrary limit on the size of markup streams. It's the simplest solution to the problem without ditching the entire OCF and creating a completely different container format from scratch. I'm hoping that such an explicit -- if yes, somewhat higher -- arbitrary limit will eventually become part of the specification itself.