MobileRead Forums - View Single Post - Mobipocket vs ePub: Why worse is better in ebook formats

Tuna · 06-20-2009, 03:26 PM

As has been pointed out, the problem with the initial article is that it confuses issues with the respective formats with issues with the readers for those formats.

Netseeker is completely right that you don't need the entire stream in memory to be able to render something like an epub file. In fact you can go further than simply building the parse tree in memory - you can store it in an index file that lives alongside the document. Indexing the document need only be done once when it is first opened and could happen in the background as you're reading the first few pages. If you have a limited number of font choices (which is true for most e-readers) you could actually index every page within the document, complete with relevant style hints so that jumping to arbitrary points would always happen instantaneously.

The penalty for such behaviour is a more complex parser and some storage overhead (which is hardly an issue when 2 gig flash cards are only a few dollars). Processor overhead really shouldn't be an issue - even on the oldest devices - but does require some understanding of real time systems to implement. Where files are transferred to the e-reader through a library application on the user's PC, the index files could even be generated at the same time, leaving the e-reader to do the bare minimum of work to display any arbitrary page.

The issue here is that epub in particular (and anything XML-y in general) lends itself to 'lazy' implementations. On modern PC's there is very little penalty for just hacking at a file, so the workarounds for dealing with large datasets just aren't common knowledge. I've worked for clients who have managed to produce 500MB data files and only then wonder why it can take a while to process them.

In general, a format like epub lends itself to transformation, so could be regarded as a 'transfer' format, which might be translated to a device specific variant that enables efficient rendering, storage and retrieval. Whilst there are pathological cases that can make parsing more complex, these can usually be transformed to simpler parse trees - and publishers should recognise that over-complex formatting benefits no-one.

06-20-2009, 03:26 PM	#33
Tuna Zealot Posts: 114 Karma: 325 Join Date: May 2009 Device: Cool-ER	As has been pointed out, the problem with the initial article is that it confuses issues with the respective formats with issues with the readers for those formats. Netseeker is completely right that you don't need the entire stream in memory to be able to render something like an epub file. In fact you can go further than simply building the parse tree in memory - you can store it in an index file that lives alongside the document. Indexing the document need only be done once when it is first opened and could happen in the background as you're reading the first few pages. If you have a limited number of font choices (which is true for most e-readers) you could actually index every page within the document, complete with relevant style hints so that jumping to arbitrary points would always happen instantaneously. The penalty for such behaviour is a more complex parser and some storage overhead (which is hardly an issue when 2 gig flash cards are only a few dollars). Processor overhead really shouldn't be an issue - even on the oldest devices - but does require some understanding of real time systems to implement. Where files are transferred to the e-reader through a library application on the user's PC, the index files could even be generated at the same time, leaving the e-reader to do the bare minimum of work to display any arbitrary page. The issue here is that epub in particular (and anything XML-y in general) lends itself to 'lazy' implementations. On modern PC's there is very little penalty for just hacking at a file, so the workarounds for dealing with large datasets just aren't common knowledge. I've worked for clients who have managed to produce 500MB data files and only then wonder why it can take a while to process them. In general, a format like epub lends itself to transformation, so could be regarded as a 'transfer' format, which might be translated to a device specific variant that enables efficient rendering, storage and retrieval. Whilst there are pathological cases that can make parsing more complex, these can usually be transformed to simpler parse trees - and publishers should recognise that over-complex formatting benefits no-one.