View Full Version : Is a better e-book format always better?


Bob Russell
05-18-2006, 05:32 PM
Of course a better e-book format is better, you say! Well, not so fast. Let me explain what I mean.

Consider the following spectrum (my apologies if the spacing doesn't come out nicely):

Text ....... HTML ........ Fancy Format
<------------------------------------------>

Clearly, HTML has some big advantages over plain text files. Hyperlinks, formatting and so forth are important improvements in content presentation. It's more flexible and competent.

We could plug in something for fancy format like Palm Doc, eReader, iSiloX, Plucker, Acrobat, LaTex, Microsoft Word or probably a hundred more formats. Each of them has advantages over simple text or HTML that make them more competent in presenting and storing (e.g. compressing) content. Pdf, for example, can represent a page much better than HTML, and is much more like the page of a book. In other words, each of these formats is "better", but it doesn't mean we prefer them.

More sophisticated formats are better in the sense of capability, but not necessarily better for us. Why not? Primarily, we are often concerned more with compatibility than capability. We want the format to be usable in any e-book reader, or editor. Just because it is powerful or flexible doesn't mean much if it's not "interoperable." It needs to be not only simple, but more importantly supported universally. So to update the diagram...

Text ....... HTML ........ Fancy Format
<------------------------------------------>
Simple/Interoperable........ Less Compatible

My point is simply to emphasize that there is great value in compatibility. Text and HTML are great because they are simple and interoperable with all kinds of software from e-book readers to browsers to word processors to format converters and so forth. Fancy formats give us things like support for one primary program, and maybe compression and DRM. So why haven't fancy formats caught on more? They are not naturally interoperable.

I am not familiar enough to speak to the value of proposed open formats. (I would love to see some simple overviews of that.) But I am pretty sure that other closed and complicated formats will have trouble dominating the e-book world. We need interoperability.

What surprises me is that we haven't seen existing standards applied to book and content presentation more. Or maybe I'm just not aware of it, I'm not sure. But why can't XML include content and presentation information. Or what about CSS? All of that can be compressed if necessary. Is there a reason it's not the basis of e-book formats? Simple is good!

rlauzon
05-18-2006, 06:36 PM
One more thing about PDF that you alluded to: It's page-centric.

What does the concept of pages mean in an eBook?

In paper books, pages are used for:
1. Partitioning the book. People usually stop reading on a page/paragraph break.
2. Indexing - the index points to the page.
3. A place to put footnotes.

In an eBook, we need none of this.
1. The partitioning is now by paragraph. This is how I read eBooks on my Palm today. There are no "pages" but rather one large text with paragraph breaks.
Bookmarks can be put anywhere in the text.
2. Indexing no longer need to refer to the page - it can be some sort of link directly to the item indexed.
3. Footnotes no longer need to be put at the bottom of the page, but can be a link directly to the footnote text anywhere in the document.

So what do we need the concept of "pages" for? And why do we need a format that is based on this out-dated notion of "pages"?

BasilC
05-18-2006, 07:05 PM
One more thing about PDF that you alluded to: It's page-centric.

What does the concept of pages mean in an eBook?

In paper books, pages are used for:
1. Partitioning the book. People usually stop reading on a page/paragraph break.
2. Indexing - the index points to the page.
3. A place to put footnotes.

In an eBook, we need none of this.
1. The partitioning is now by paragraph. This is how I read eBooks on my Palm today. There are no "pages" but rather one large text with paragraph breaks.
Bookmarks can be put anywhere in the text.
2. Indexing no longer need to refer to the page - it can be some sort of link directly to the item indexed.
3. Footnotes no longer need to be put at the bottom of the page, but can be a link directly to the footnote text anywhere in the document.

So what do we need the concept of "pages" for? And why do we need a format that is based on this out-dated notion of "pages"?
Couldn't agree more. "Pages" are bad news when it comes to ebooks, and pdf is the worst news of all. I still haven't found a reliable way of reading a pdf file with tables or graphics (there are pdf readers that handle text pretty well, though turning a page still takes time). If only there was a cheap and reliable way of converting from pdf to html so that it could be converted by iSiloX or Sunrise! But the pdf conversion programs that you see are mostly horrendously expensive.

tribble
05-19-2006, 05:09 AM
I think an open XML Book format would be best. Easy to exchange. Can include images and links and can easyly be converted. DRM issues would be a bit of a problem here, but some smart people will solve that.

On the other hand you can not dismiss the concept of pages, because every display uses pages to display the content. Just the "fixed" pages like in PDF at the moment is not good for the variety of not standardized reading devices. But if i recall correctly, Adobe is working on a resizable pageformat, which lets content flow depending on the user set page size.

yokos
05-19-2006, 07:31 AM
I will use my Iliad for reading "scientific" pdf-files [converting tex -> with dvipdfm -> to pdf].

LaTex ist [I]the format for publishing scientific texts. It's a horror to write mathematical equations in office suites like Word or OpenOffice.
It's easy to change the page size in latex-files - just change the preamble & your dvi-file get's a new perfect layout.

To make it clear: LateX files are txt files with layout instructions.

We will see how readable it is to read pdf-files published in A4 format on an A5 screen. Yes, pdf is page-centric, because it's a output format for printing on paper.

Dismissing the concept of pages makes no sense for me. This is how a printed book works: text with layout. I hope reading with the iliad will have the same reading feeling like printed books.

Liviu_5
05-19-2006, 09:42 AM
Hi,

To me the ideal book format would be like this:

1: For regular books that consist mostly of text with possibly a picture here and there, so for most (nonscans) fiction books, the smallest universal format necessary which is txt for some, html for others (for all if you want chapter links, and other embedded goodies). I would have everything (including the pictures) embedded in one file with the appropriate folder in the html case

2: For books that contain text, but also have figures, diagrams and the like everywhere in an essential way (most technical books, comics...), and for image only pdf/djvu scans whether they are of type 1 or 2 books, a format that can render the book a page at a time without reading the whole book for info (like pdf or djvu does now) unless you so choose. The book itself may take lots of hard disk/flash memory space (cheap), but the rendering should take very little ram (expensive). The closest I find it now is to use jpg images embedded in a blank html, cut to device size and maybe in two (nokia 770 or Ebookwise 1150 for me) if appropriate, but I have to work somewhat to tailor them from my pdf/djvu/scans and I would rather have a format that does that automatically. The djvu script for librie that I saw is a hack in that direction (takes a djvu page, cuts it to size, pads, makes it librie readable and displays it), but I have no idea how fast it works. A format that would contain an executable script of this type with the possibilty of cutting the page to adjust to device screen without scrolling/zooming would be ideal.

Liviu

Impixi
05-19-2006, 09:53 PM
The trouble with plain text is it lacks the necessary 'expressibility'. Even for fiction, you occasionally need to emphasise words or clarify/distinguish blocks of text (eg for a change of tense or perspective, etc).

For online viewing, I think HTML is suitable for most purposes. The biggest drawback, however, from a publisher's point of view, is the lack of DRM. The absence of file compression and 'packaging' can also be a nuisance.

For documents that will be printed, PDF is probably the best bet. Again, the biggest drawback from a publisher's point of view is the lack of decent DRM.

Liviu_5
05-19-2006, 11:38 PM
Hi,

While I agree that plain text lacks expressibility, I never found that feature useful in most books I've read, and many times it is downright annoying ("when character A mindtalks with character B I put it in italics" kind of stuff), so I am happy to convert to plain text. Personally I believe a novel should convey the meaning through text only (leaving aside maps and stuff like that), but that is my taste. Project Guttenberg with all the classics does just right in txt for example. I like html more for embedded stuff like table of contents, chapters, maps...
About DRM, I just do not touch such books. I'd rather buy them in print and scan them for my personal use even though it takes a little time, than buy a
rental book (which any drm implies - you accept drm, you do not own the book, you just rent it).


Liviu

The trouble with plain text is it lacks the necessary 'expressibility'. Even for fiction, you occasionally need to emphasise words or clarify/distinguish blocks of text (eg for a change of tense or perspective, etc).

For online viewing, I think HTML is suitable for most purposes. The biggest drawback, however, from a publisher's point of view, is the lack of DRM. The absence of file compression and 'packaging' can also be a nuisance.

For documents that will be printed, PDF is probably the best bet. Again, the biggest drawback from a publisher's point of view is the lack of decent DRM.

Robert Marquard
05-20-2006, 11:54 AM
An ebook should have the same level of expression as a real book. The format should be standard. The format should allow new ways of expression (like links).
This naturally leads to HTML. It is the format for most ebook formats anyway.
PDF is optimized for a fixed page size per book so it is unusable.

rsperberg
05-26-2006, 07:25 AM
Bob, I've got a couple problems with the situation as you lay it out.

In the first place, a number of your "fancy format" formats are really just HTML under the covers anyway, so the amount extra that they bring to the table isn't much.

And then, second, a major reason for their being on the "less compatible" end of the spectrum has nothing whatsoever to do with their complexity/capability and everything to do with proprietary thinking.

If you think about the Open Office document format that has recently been accepted as an ISO standard (and before that as an OASIS specification), it would fall way to the right on the complexity/capability spectrum but, being fully open, has the potential to be all the way on the left in terms of interoperability and acceptance.

One thing that I think you want to include as a factor in complexity is not merely the formatting capabilities, but the metadata that an arbitrary XML file can contain. If, for instance, every company name in my history of Wall Street is indicated by <company> tags, then it's easy to search and locate the instances when I'm looking for Charles Schwab the company and not Charles Schwab the person.

If we think of e-books as being designed for current publishing, then markup of metadata seems less significant. But what if every piece of business communication shared the same markup as the e-books? What if the e-reader wasn't an e-book reader, but instead a tool optimized for reading anything and everything we have to read on-screen?

I think in that case that fancier formats, or more complex formats, would be so much more useful that they would become the de facto standard and thus widely accepted in many applications.

Roger Sperberg

Bob Russell
05-26-2006, 09:46 AM
Thanks for your thoughts Roger. I agree that in the right circumstances more complexity is not always bad. HTML is more complex than text, for example. But that's really more of an exception than the rule. It's really tough to get a format universally adopted like HTML or MP3.

But my point is really not that we will never see another universally adopted e-book format (I hope we do), but to point out that just because a format has better features doesn't mean it's better for users. Knowing that you can read a format in almost any e-book reader can outweigh all kinds of other benefits.In the first place, a number of your "fancy format" formats are really just HTML under the covers anyway, so the amount extra that they bring to the table isn't much.
Ah, but just because something is HTML-based doesn't mean it's compatible. Once you change it, it's not HTML anymore, even if it's just due to compression My basic point is that compatibility and universality are more imoprtant than improved functionality.And then, second, a major reason for their being on the "less compatible" end of the spectrum has nothing whatsoever to do with their complexity/capability and everything to do with proprietary thinking.

If you think about the Open Office document format that has recently been accepted as an ISO standard (and before that as an OASIS specification), it would fall way to the right on the complexity/capability spectrum but, being fully open, has the potential to be all the way on the left in terms of interoperability and acceptance.Good point, but I don't think "proprietariness" is the key any more than complexity is. They are both just contributing factors to interoperability and universality, which is the key. Proprietariness (is that a word?) is very important to universal adoption, but openness doesn't guarantee compatibility either -- it has to be widely adopted to really add value to a wide audience.One thing that I think you want to include as a factor in complexity is not merely the formatting capabilities, but the metadata that an arbitrary XML file can contain. If, for instance, every company name in my history of Wall Street is indicated by <company> tags, then it's easy to search and locate the instances when I'm looking for Charles Schwab the company and not Charles Schwab the person.

If we think of e-books as being designed for current publishing, then markup of metadata seems less significant. But what if every piece of business communication shared the same markup as the e-books? What if the e-reader wasn't an e-book reader, but instead a tool optimized for reading anything and everything we have to read on-screen?Very cool! You have described a manner in which adding complexity can actually make a format more widely popular. If you add features that allow it to be used across purposes, so that that there can be a common popular format for publishing, e-books, and all kinds of other document usage then it has a much greater chance of success. Makes sense. And an e-book format could then "ride the coattails" of adoption of the format for other purposes. Interesting view!I think in that case that fancier formats, or more complex formats, would be so much more useful that they would become the de facto standard and thus widely accepted in many applications. On the other hand, is it really possible for one format to do all those things well?

Liviu_5
05-26-2006, 02:05 PM
Hi,

I do not see any acceptable current format encopassing the wide range of possible ebooks.

Pdf does it to a great extent but is so bloated and rigid to be unacceptable.

For most fiction books you do not really need more than txt or html, so why bloat the file? And again I believe strongly that we need narrative in our lives, so fiction as written/sung/narrated since humanity exists will still exist with us for a long time.

For nonfiction, I see great advantages in "externalising" the book (through links, notes, comments...) but again I see a need to balance that against bloat.

Anyway the slow progress of e-books is not due to a lack of a universal format, but to a lack of affordable ereaders and to drm stupidity. Until a 100$ or less colour, paperback or less sized, 400 grams or less in weight, 800x480 resolution minimum ereader and tons of 5-6$ drm free ebooks arrive, we can have the perfect format and it still would not matter.

Liviu




On the other hand, is it really possible for one format to do all those things well?

Chaos
05-26-2006, 03:20 PM
The absence of file compression and 'packaging' can also be a nuisance.
Formats like Plucker and iSilo are just compressed, packaged, somewhat re-formatted HTML. Plucker, for example, can be unpackaged into the HTML that the Palm/PPC device renders with a viewer.

I don't see why people really want to put DRM on books of all things. I mean, does enough of the population seriously read enough to warrant that? ;)

Personally I think HTML is the best current format for ebooks. Very compatible, common enough to be usable for the foreseeable future, and plain text enough to be permanently readable. Next to this, plain text is good, but lacks some formatting needed for some books. (Italics for thinking, pictures/diagrams in technical books, images in illustrated books, etc. - HTML doesn't have these limitations, which is why I put it first.)

rlauzon
05-26-2006, 04:13 PM
I don't see why people really want to put DRM on books of all things. I mean, does enough of the population seriously read enough to warrant that?

DRM is sold to the ignorant as a "prevent piracy" measure. After all, no one likes to have something stolen from themselves. So many people believe it.

To the not-so-ignorant, DRM is about control - not piracy. Control over under what circumstances the content can be used. Control over the sale of the item (remember many years back when the RIAA tried to get a fee from every second-hand CD sold?). Control over the work forever - not for a "limited time" that is given by copyright law.

Bob Russell
05-26-2006, 04:34 PM
To the not-so-ignorant, DRM is about control - not piracy. Control over under what circumstances the content can be used. Control over the sale of the item (remember many years back when the RIAA tried to get a fee from every second-hand CD sold?). Control over the work forever - not for a "limited time" that is given by copyright law.Surely major corporations wouldn't try that sort of thing? Oh wait, didn't we just hear about this?!!!... http://itvibe.com/news/4063/

Liviu_5
05-26-2006, 05:20 PM
Hi,

Unbelievable!! I am not into gaming so this does not affect me personally, but still if they do this (Sony making it illegal to sell second hand games as per cited link), it would be a gross violation of consumer trust and of ownership rules.

Liviu


Surely major corporations wouldn't try that sort of thing? Oh wait, didn't we just hear about this?!!!... http://itvibe.com/news/4063/

MatYadabyte
05-27-2006, 10:17 AM
In the early days of Ebooks on the Psion and Palms the issues were more to do with file size and render speed rather than in page features and app features. But this is now pretty much redundant requirements on most devices, so we should be looking for a more generic format, HTML being the obvious choice. As Bob says

Mat
My Blog; http://www.salted.net