View Full Version : Why must epub files be broken up?


darkmonk
04-10-2009, 12:21 AM
Hello all, I may be hand coding some epub files soon. I was looking over some specs, and I couldn't figure out why it wanted the split into diferent files. Could someone explain why this is, and what one generally used to split the file? Thanks.

ilovejedd
04-10-2009, 01:19 AM
Most ebook readers have limited resources (processor, memory, etc). It's easier for them to process 300kb chunks compared to a 1MB file without splits.

That said, I've found the html2epub commandline from Calibre to work pretty well. If it's easier for you to edit, just use one html file for your work, and convert using Calibre's html2epub. It should divide it into chunks automatically.

pepak
04-10-2009, 01:20 AM
That's easy! Unlike computers, e-book devices don't use the powerful CPUs and loads of memory. The splitting allows them to process a small amount of data at a time, which should require less CPU power and less memory.

darkmonk
04-10-2009, 11:55 AM
That's easy! Unlike computers, e-book devices don't use the powerful CPUs and loads of memory. The splitting allows them to process a small amount of data at a time, which should require less CPU power and less memory.

Well, see, that is a valid argument - but appears not to be true. I calculated the average speed of ebook processors to be 355MHz, none of which had less then 64MB RAM. See, that doesn't seem handicapped to me. None should have the problem of not fitting a book into RAM.

...but then I figured out why it must be; when content is reflowed, ie by changing font size, that must all be applied. That would take some time. And so the limit makes a bit of sense - although I dearly wish it was larger.

But what I was also asking was where I should split them. Having a split in the middle of a chapter would be annoying, but for best responsiveness, I might want to make a file a chapter. That would also simplify the ToC. Hell, maybe it should have been written into the standard like that.

What are your thoughts?

ilovejedd
04-10-2009, 12:08 PM
By convention (at least from epubs I've seen), splits are made at chapter points and/or hard pagebreaks. 1 file = 1 chapter.

You can create an epub with the HTML not split into chunks, but loading on an ebook reader/device would probably take longer, compared to when the HTML is split.

wallcraft
04-10-2009, 12:29 PM
Adobe Digital Editions for handheld devices is currently available on the Sony PRS-505 and PRS-700, but will soon be available on several other EInk devices. It requires the ePub to be split into manageable chunks, which are typically chapters. If a chapter is very long it may require multiple files per chapter. The easiest way to handle this is to run your document through Calibre, which will add PRS-505 specific processing if you ask for it. This includes chapter splitting and other work-arounds. Note that the input can be ePub if you like (or HTML or whatever), and you can always explode the resulting ePub and apply your own customization afterwards if you want.

HarryT
04-10-2009, 12:55 PM
I must admit that I still don't really "get" why this is really necessary. Why does it take more memory to "process" a long book than a short one, when all that one needs to display is the current page? It doesn't seem to be an issue with any other book format.

kovidgoyal
04-10-2009, 01:00 PM
I must admit that I still don't really "get" why this is really necessary. Why does it take more memory to "process" a long book than a short one, when all that one needs to display is the current page? It doesn't seem to be an issue with any other book format.

Its because of the way CSS works. In order to correctly apply CSS rules, the entire XHTML tree must be parsed into memory.

HarryT
04-10-2009, 01:17 PM
Thanks, Kovid; I hadn't realised that. Is there really no other way to do it?

Jellby
04-10-2009, 01:19 PM
I must admit that I still don't really "get" why this is really necessary. Why does it take more memory to "process" a long book than a short one, when all that one needs to display is the current page? It doesn't seem to be an issue with any other book format.

Well... I've seen a (maybe related) issue with mobipocket in the Cybook. If you have an anchor inside an element (say inside <h1>...</h1>) and you jump to this anchor, then you don't see the content formatted according to the element (i.e., you don't see the <h1> content in bold large font), probably because the reader doesn't know it's inside an element... probably because it didn't load the whole book into memory... (I don't know if this happens always, but it happens sometimes at least). That's also probably a reason why the nesting level in mobipocket is severely limited.

HarryT
04-10-2009, 01:22 PM
Well... I've seen a (maybe related) issue with mobipocket in the Cybook. If you have an anchor inside an element (say inside <h1>...</h1>) and you jump to this anchor, then you don't see the content formatted according to the element (i.e., you don't see the <h1> content in bold large font), probably because the reader doesn't know it's inside an element... probably because it didn't load the whole book into memory... (I don't know if this happens always, but it happens sometimes at least). That's also probably a reason why the nesting level in mobipocket is severely limited.

That's very true - you have to make sure that your links are "outside" formatting tags.

kovidgoyal
04-10-2009, 01:34 PM
Thanks, Kovid; I hadn't realised that. Is there really no other way to do it?

Not really since CSS allows for selectors based on ancestors and descendants of a tag, to use those selectors, the full tree must be loaded. In addition as Jellby pointed out CSS properties can be inherited, for which you need at least the entire tree before a tag

mtravellerh
04-10-2009, 01:41 PM
That's very true - you have to make sure that your links are "outside" formatting tags.

You can make links with ids, too. This way Mobi recognizes the tags, too.

<h2 id="toc">Contents