Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > News

Notices

Reply
 
Thread Tools Search this Thread
Old 06-20-2009, 02:50 PM   #31
jgray
Fanatic
jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.
 
Posts: 554
Karma: 2928497
Join Date: Mar 2008
Device: Clara 2E & Sage
Quote:
Originally Posted by netseeker View Post
Can mobile browsers render even large XHTML-documents correctly? Yes. Do they need to load the whole document into memory? No. ePub-renderers also don't need to do that. It's just a matter of how the software developers implement their CSS renderer and not really a matter of "XML parse-trees".
Even on mobile devices XML can be used as stream. The parsing can be done using event models (SAX) or newer approaches (StaX). The XML isn't the problem, some CSS-styles are. Mobile browsers were able to solve that and hence ePub-renderers could solve that too.
I was just reading up on StaX. I'm no expert, but this does look like a good fit for an e-reader application.

From reading your comments (including the ones later in the thread), it certainly looks like you know what you're talking about. Perhaps you should work at Adobe
jgray is offline   Reply With Quote
Old 06-20-2009, 03:02 PM   #32
netseeker
sleepless reader
netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.
 
netseeker's Avatar
 
Posts: 4,763
Karma: 615547
Join Date: Jan 2008
Location: Germany, near Stuttgart
Device: Sony PRS-505, PB 360° & 302, nook wi-fi, Kindle 3
Quote:
Originally Posted by jgray View Post
I was just reading up on StaX. I'm no expert, but this does look like a good fit for an e-reader application.

From reading your comments (including the ones later in the thread), it certainly looks like you know what you're talking about. Perhaps you should work at Adobe
Thank you.

Working for Adobe would be a No-Go. kovid and Harry could blame me then for all the little issues with Adobe Digital Editions.

On the other hand Adobe's software architects and coders are usually really good. I don't know why ADE was implemented that poorly.
netseeker is offline   Reply With Quote
Advert
Old 06-20-2009, 03:26 PM   #33
Tuna
Zealot
Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.
 
Posts: 114
Karma: 325
Join Date: May 2009
Device: Cool-ER
As has been pointed out, the problem with the initial article is that it confuses issues with the respective formats with issues with the readers for those formats.

Netseeker is completely right that you don't need the entire stream in memory to be able to render something like an epub file. In fact you can go further than simply building the parse tree in memory - you can store it in an index file that lives alongside the document. Indexing the document need only be done once when it is first opened and could happen in the background as you're reading the first few pages. If you have a limited number of font choices (which is true for most e-readers) you could actually index every page within the document, complete with relevant style hints so that jumping to arbitrary points would always happen instantaneously.

The penalty for such behaviour is a more complex parser and some storage overhead (which is hardly an issue when 2 gig flash cards are only a few dollars). Processor overhead really shouldn't be an issue - even on the oldest devices - but does require some understanding of real time systems to implement. Where files are transferred to the e-reader through a library application on the user's PC, the index files could even be generated at the same time, leaving the e-reader to do the bare minimum of work to display any arbitrary page.

The issue here is that epub in particular (and anything XML-y in general) lends itself to 'lazy' implementations. On modern PC's there is very little penalty for just hacking at a file, so the workarounds for dealing with large datasets just aren't common knowledge. I've worked for clients who have managed to produce 500MB data files and only then wonder why it can take a while to process them.

In general, a format like epub lends itself to transformation, so could be regarded as a 'transfer' format, which might be translated to a device specific variant that enables efficient rendering, storage and retrieval. Whilst there are pathological cases that can make parsing more complex, these can usually be transformed to simpler parse trees - and publishers should recognise that over-complex formatting benefits no-one.
Tuna is offline   Reply With Quote
Old 06-20-2009, 09:20 PM   #34
jgray
Fanatic
jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.
 
Posts: 554
Karma: 2928497
Join Date: Mar 2008
Device: Clara 2E & Sage
Quote:
Originally Posted by Tuna View Post
Indexing the document need only be done once when it is first opened and could happen in the background as you're reading the first few pages.
I don't know the internal workings, but MS Reader appears to do something along these lines. Notice when you open a large LIT ebook that you can start reading right away, but the page counter at the bottom is still churning away.
jgray is offline   Reply With Quote
Old 06-20-2009, 10:39 PM   #35
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You certainly need to store the entire XML tag structure in memory to support CSS 2.1, you don't need to store the text nodes and that will reduce memory consumption for files that have a low tag/text ratio. However, you still have to read and parse the entire tag tree. Whether you do it in a streamed fashion or not. The reason you have to read and parse the entire tree is to support CSS selectors. I suggest you read the following to understand just why it is neccessary http://www.w3.org/TR/CSS2/selector.h...dant-selectors

It's certainly true that you don't have to store all the text content in memory and that means the size limit of 300KB can probably be increased. But frankly, the increased programming complexity and consequent rendering fragility is not worth it. I think 300KB is a perfectly reasonable limit. EPUB creators simply have to keep it in mind.

As for pre-parsing and storing rendered versions of the file, I think that is an extremely inelegant solution and imposes an absolute restriction on allowing display modification by the user. Say good bye to allowing free font resizing, line space and margin adjustments.
kovidgoyal is online now   Reply With Quote
Advert
Old 06-21-2009, 09:29 AM   #36
Tuna
Zealot
Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.
 
Posts: 114
Karma: 325
Join Date: May 2009
Device: Cool-ER
I'm fully aware of CSS selectors thank you :-) I've been involved with web browsers - at the software level - for just shy of 15 years now. There's no reason why you can't stream through an XML document and flatten the selector space as you index it - and as the range of styles in a document is rarely that extensive the penalty for doing so is minimal.

Rendering fragility is a poor excuse - navigating the parse tree is a task that can be modularised and act as both a driver and restorer of index and style information. It's really not that difficult once you've got the framework in place. Any codebase that assumes or imposes an arbitrary size restriction on parsing a dataset that itself is unrestricted and shared with unknown third party software is bound to fail, however generous those restrictions may appear to be.

Inelegant solutions are necessary where you are trying to provide the ideal user experience. Agreed that completely free font sizing, line spacing and margin adjustments present an insurmountable task. However, it's worth noticing that most devices don't offer unlimited options, and most users will only ever select a small subset of those. Indexing (not storing rendered versions - ugh!) need only be performed on the most recent and most preferred choices. It's only when the user is determined to cycle through every option whilst reading the last page in a document that the interface need degrade to worst case re-parsing. Notice that even then, the experience is no worse than current 'all in memory' solutions. With indexing for rendering being a multi-level process, even that worst case can be hurried along by removing the need for dealing with overly complex parse trees.
Tuna is offline   Reply With Quote
Old 06-21-2009, 09:45 AM   #37
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,550
Karma: 19500001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by Tuna View Post
Any codebase that assumes or imposes an arbitrary size restriction on parsing a dataset that itself is unrestricted and shared with unknown third party software is bound to fail, however generous those restrictions may appear to be.
True. But it is also true that it's not sensible to expect unlimited size support in all applications. The ePUB spec should probably have stated some minima for filesizes (and maybe nesting levels or number of styles) rendering software must support, so that a file can be guaranteed to work in conformant readers. Readers with more resources available could support larger limits, but there would be some minimum we could rely on.
Jellby is online now   Reply With Quote
Old 06-21-2009, 09:57 AM   #38
netseeker
sleepless reader
netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.
 
netseeker's Avatar
 
Posts: 4,763
Karma: 615547
Join Date: Jan 2008
Location: Germany, near Stuttgart
Device: Sony PRS-505, PB 360° & 302, nook wi-fi, Kindle 3
Quote:
Originally Posted by kovidgoyal View Post
However, you still have to read and parse the entire tag tree. Whether you do it in a streamed fashion or not. The reason you have to read and parse the entire tree is to support CSS selectors. I suggest you read the following to understand just why it is neccessary http://www.w3.org/TR/CSS2/selector.h...dant-selectors
Well i know CSS 2.1 and descendant selectors very well and i know that the W3C recommends parsing the whole element tree. But can we agree on that technically "only" knowledge about the parent tree of elements is required to support those selectors? There is no knowledge about the subsequent parts of the xml tree required.

Quote:
Originally Posted by kovidgoyal View Post
It's certainly true that you don't have to store all the text content in memory and that means the size limit of 300KB can probably be increased. But frankly, the increased programming complexity and consequent rendering fragility is not worth it. I think 300KB is a perfectly reasonable limit. EPUB creators simply have to keep it in mind.
It's a restriction which is not necessary imo. Maybe the next ePub renderer will require 200Kb as limit and another one chooses 180Kb and so on. ePub creators shouldn't have to deal with restrictions which are not part of the ePub specification.

Quote:
Originally Posted by kovidgoyal View Post
As for pre-parsing and storing rendered versions of the file, I think that is an extremely inelegant solution and imposes an absolute restriction on allowing display modification by the user. Say good bye to allowing free font resizing, line space and margin adjustments.
Agreed, storing rendered versions of a file would be the wrong way.
netseeker is offline   Reply With Quote
Old 06-21-2009, 11:41 AM   #39
Tuna
Zealot
Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.
 
Posts: 114
Karma: 325
Join Date: May 2009
Device: Cool-ER
Quote:
Originally Posted by Jellby View Post
True. But it is also true that it's not sensible to expect unlimited size support in all applications.
I don't think that's the case. In virtually all modern programming environments, there's no reason to impose file size limits, except for the cases where system APIs or system resource allocation restrict the application.

Quote:
Originally Posted by Jellby View Post
The ePUB spec should probably have stated some minima for filesizes (and maybe nesting levels or number of styles) rendering software must support, so that a file can be guaranteed to work in conformant readers. Readers with more resources available could support larger limits, but there would be some minimum we could rely on.
Not at all. The parse tree required to display any given point in an ebook is never going to be complex enough to be an issue (we're talking kilobytes in systems that have had megabytes of user space memory for approaching a decade or so). If a device is resource restricted it might have to take the long way round to display a page, but that's a design decision on the part of the device manufacturer and shouldn't be the concern of the standards committee.

Certainly epub offers the developer many choices as to how they render the document. Nothing in the spec requires that any given section of the document have to be stored in memory at one go in order to be rendered.

The only 'reasonable' restriction might be to say that a device with X megabytes of storage space should be able to render the largest single book that can fit on it's storage. How they go about doing that is up to the firmware developer. Again though, it's really no business of the standards body.

Last edited by Tuna; 06-21-2009 at 11:57 AM.
Tuna is offline   Reply With Quote
Old 06-21-2009, 11:54 AM   #40
Tuna
Zealot
Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.Tuna has a complete set of Star Wars action figures.
 
Posts: 114
Karma: 325
Join Date: May 2009
Device: Cool-ER
Quote:
Originally Posted by netseeker View Post

Agreed, storing rendered versions of a file would be the wrong way.
Let me be clear that I would never suggest storing rendered versions - that was a misunderstanding on kovid's part. My point was that files can be pre-parsed and indexed to support efficient rendering. At the base level, indexing can involve both flattening the parse tree (eliminating those pesky CSS selectors) and providing efficient search spaces for locating document parts.

Page indexing be done separately and needn't be expensive - consider that you're talking about a few hundred indices for most books - so even if you allow for (say) a couple of dozen most likely combinations of font sizes, line spacing and margin settings your document index need not be more than a few kilobytes in size. That's hardly a high price in return for instant page turns and accurate next/previous page behaviour.
Tuna is offline   Reply With Quote
Old 06-21-2009, 12:00 PM   #41
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Tuna View Post
I'm fully aware of CSS selectors thank you :-) I've been involved with web browsers - at the software level - for just shy of 15 years now. There's no reason why you can't stream through an XML document and flatten the selector space as you index it - and as the range of styles in a document is rarely that extensive the penalty for doing so is minimal.
So every time either the user or some javascript makes a change to the DOM/CSS you propose re-indexing the entire tree (at least upto the current point?) For example, calibre's EPUB viewer actaually supports a reference mode that changes the DOM and the CSS of document elements on the fly in response to user interaction. The scheme you propose might work well for static content that the user never interacts with/modifies but not for anything else. And frankly making that trade-off (removing size restrictions but making interactivity much slower) is just wrong.

Quote:
Inelegant solutions are necessary where you are trying to provide the ideal user experience. Agreed that completely free font sizing, line spacing and margin adjustments present an insurmountable task. However, it's worth noticing that most devices don't offer unlimited options, and most users will only ever select a small subset of those. Indexing (not storing rendered versions - ugh!) need only be performed on the most recent and most preferred choices. It's only when the user is determined to cycle through every option whilst reading the last page in a document that the interface need degrade to worst case re-parsing. Notice that even then, the experience is no worse than current 'all in memory' solutions. With indexing for rendering being a multi-level process, even that worst case can be hurried along by removing the need for dealing with overly complex parse trees.

Umm imposing a size restriction is an inelegant solution, but one that works for end users, EPUB creators (provided they just keep it in mind) and the creators of EPUB document viewers. Instead you propose a system that offers a very slight benefit to EPUB document creators and very large overhead on the creators of EPUB rendering software for no benefit to EPUB end users. If people had to guarantee that EPUB renderers could render any size/complexity of XHTML on any device, the format would never have gotten off the ground.
kovidgoyal is online now   Reply With Quote
Old 06-21-2009, 12:07 PM   #42
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by netseeker View Post
Well i know CSS 2.1 and descendant selectors very well and i know that the W3C recommends parsing the whole element tree. But can we agree on that technically "only" knowledge about the parent tree of elements is required to support those selectors? There is no knowledge about the subsequent parts of the xml tree required.
I must concede that, knowledge of the parent tree is all that's required. But does that really make a difference (after all on average you will be rendering a point halfway down the file, so the average speedup you can achieve by only parsing parents is only a factor of 2). I guess where it will make a big difference is in jumping from one XHTML file to the next.
kovidgoyal is online now   Reply With Quote
Old 06-21-2009, 12:09 PM   #43
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Tuna View Post
Let me be clear that I would never suggest storing rendered versions - that was a misunderstanding on kovid's part. My point was that files can be pre-parsed and indexed to support efficient rendering. At the base level, indexing can involve both flattening the parse tree (eliminating those pesky CSS selectors) and providing efficient search spaces for locating document parts.
You are proposing creating an indexed version with flattened CSS (in other words a version that is rendered to some sort of binary format), not a pixel rendering perhaps, but still a rendered version and it will suffer from the same drawbacks. the drawbacks being the need to re-render everytie anything changes.
kovidgoyal is online now   Reply With Quote
Old 06-21-2009, 12:21 PM   #44
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by kovidgoyal View Post
I must concede that, knowledge of the parent tree is all that's required.
Are you sure about that? I am almost certain selectors such as :first-child and + are a part of CSS2.1 and most certainly it's not enough to know the parent tree to render them.
pepak is offline   Reply With Quote
Old 06-21-2009, 12:24 PM   #45
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by pepak View Post
Are you sure about that? I am almost certain selectors such as :first-child and + are a part of CSS2.1 and most certainly it's not enough to know the parent tree to render them.
Shouldn't just the parent tree be enough for first child and +? Where parent tree actually means not just parents but all siblings that occur before a given element in document order as well. Perhaps pre-tree would be a better term.
kovidgoyal is online now   Reply With Quote
Reply

Tags
epub, mobi


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert Epub and Msreader formats to Kindle formats bruc79 Calibre 17 06-22-2010 04:50 AM
Other formats than ePub or Zip? Robotech_Master Calibre 4 05-28-2009 02:15 PM
Converting epub to other formats garygibsonsf ePub 6 05-06-2009 12:25 PM
Formats for PRS-505 / Mobipocket thorswitch Sony Reader 6 06-07-2008 08:43 PM
Announcing: MOBI2IMP v9 will directly convert mobipocket .prc to .IMP formats nrapallo Kindle Formats 4 03-22-2008 01:38 AM


All times are GMT -4. The time now is 10:22 AM.


MobileRead.com is a privately owned, operated and funded community.