MobileRead Forums - View Single Post - What format to store books in? What software to read them with?

nairbv · 12-28-2007, 02:46 AM

kovidgoyal:

A custom XML format would reduce to HTML with a very short/simple xsl script. Extra code wouldn't really be involved since the spec would provide an (obviously device customizable) xsl script that converted the whole thing to HTML in a consistent way. I'm not sure you'd even see a difference in how the browser rendered it.

And it's not like this means providing a special program everyone has to rely on. XSL stylesheets are pretty standard formating. I'm pretty sure it's a w3c standard. It's more like a default (but overridable) stylesheet, that would sort of go hand-in-hand with the DTD.

Look at, for example: http://www.w3schools.com/xml/simplexsl.xml
here, XML holding very specific customized data renders just fine in firefox, because it looks up the xsl stylesheet and internally converts to HTML (if you view source you'll just see XML). (example link came from http://www.w3schools.com/xml/xml_xsl.asp)

As a *base* format I think it would make more sense because of the advantages of being able to store all the data, all in the most practical way, but the software reader would still interpret it as HTML or XHTML or TXT or however you told it to interpret it, and because it would have the fastest and easiest means of conversion to other formats.

I feel like by hanging onto html standards they don't make writing of tools any easier, .. and if anything makes it more difficult because the epub files ends up so messy. ... and all at the expense of sacrificing our ability to store inherently bookish data.

Also, it's not about straight-jacketing people, it's about making sure it's easy to parse out relevant data. If someone's writing up some custom browser and doesn't want to support all of HTML for example, they can support a subset. Epub may have chosen a subset of XHTML to address this complexity issue, but it's a rigid subset. If they had chosen a purer XML, they could have left it up to the users (reader software developers and parser writers who minorly tweak the xsl stylesheeet) to decide what subsets *they* wanted to support. I think this would inherently add flexibility not straight-jacketing.

It would even benefit advertisers and such (not that I'm advocating helping them), since a minor tweak in the XSL could throw advertisements or anything else into a margin on the page. With the whole book stored in HTML, how do you go about adding non-book related information on the display end? It would have to be in a separate window of some sort. Less flexible.

Also, it's about reducing confusion. I don't know that much about epub, but it seems that someone creating an epub file might unthinkingly put a chapter title in an h1 tag, or in some HTML meta-data tag, or in some epub meta tag, or in the HTML's title tag, or even just add it in bold text at the start of a chapter. ... There are dozens of issues like this that would be resolved with a more rigid, consistent, purpose driven format. If you think giving someone one and only one place to store a title is "straight-jacketing" then I guess we just disagree. Of course there might be multiple titles types like "chapter 1" vs "section 1" etc, and then the actual title of the chapter, but the just mentioned distinction in data to be stored for a title represents that much more how a purpose driven XML would be better suited to storing the data in a non-confusing manner.

I know I'm stuck with epub for now, and this is all just theoretical, ... but I feel like what I'm suggesting would have made more sense. I'm not seeing any arguments that convince me otherwise.

12-28-2007, 02:46 AM	#30
nairbv Connoisseur Posts: 88 Karma: 15 Join Date: Nov 2007 Device: still looking for an ebook reader device	kovidgoyal: A custom XML format would reduce to HTML with a very short/simple xsl script. Extra code wouldn't really be involved since the spec would provide an (obviously device customizable) xsl script that converted the whole thing to HTML in a consistent way. I'm not sure you'd even see a difference in how the browser rendered it. And it's not like this means providing a special program everyone has to rely on. XSL stylesheets are pretty standard formating. I'm pretty sure it's a w3c standard. It's more like a default (but overridable) stylesheet, that would sort of go hand-in-hand with the DTD. Look at, for example: http://www.w3schools.com/xml/simplexsl.xml here, XML holding very specific customized data renders just fine in firefox, because it looks up the xsl stylesheet and internally converts to HTML (if you view source you'll just see XML). (example link came from http://www.w3schools.com/xml/xml_xsl.asp) As a base format I think it would make more sense because of the advantages of being able to store all the data, all in the most practical way, but the software reader would still interpret it as HTML or XHTML or TXT or however you told it to interpret it, and because it would have the fastest and easiest means of conversion to other formats. I feel like by hanging onto html standards they don't make writing of tools any easier, .. and if anything makes it more difficult because the epub files ends up so messy. ... and all at the expense of sacrificing our ability to store inherently bookish data. Also, it's not about straight-jacketing people, it's about making sure it's easy to parse out relevant data. If someone's writing up some custom browser and doesn't want to support all of HTML for example, they can support a subset. Epub may have chosen a subset of XHTML to address this complexity issue, but it's a rigid subset. If they had chosen a purer XML, they could have left it up to the users (reader software developers and parser writers who minorly tweak the xsl stylesheeet) to decide what subsets they wanted to support. I think this would inherently add flexibility not straight-jacketing. It would even benefit advertisers and such (not that I'm advocating helping them), since a minor tweak in the XSL could throw advertisements or anything else into a margin on the page. With the whole book stored in HTML, how do you go about adding non-book related information on the display end? It would have to be in a separate window of some sort. Less flexible. Also, it's about reducing confusion. I don't know that much about epub, but it seems that someone creating an epub file might unthinkingly put a chapter title in an h1 tag, or in some HTML meta-data tag, or in some epub meta tag, or in the HTML's title tag, or even just add it in bold text at the start of a chapter. ... There are dozens of issues like this that would be resolved with a more rigid, consistent, purpose driven format. If you think giving someone one and only one place to store a title is "straight-jacketing" then I guess we just disagree. Of course there might be multiple titles types like "chapter 1" vs "section 1" etc, and then the actual title of the chapter, but the just mentioned distinction in data to be stored for a title represents that much more how a purpose driven XML would be better suited to storing the data in a non-confusing manner. I know I'm stuck with epub for now, and this is all just theoretical, ... but I feel like what I'm suggesting would have made more sense. I'm not seeing any arguments that convince me otherwise.