View Full Version : The Mobipocket format: Starring Leonardo diCaprio and Kate Winslet


schmidt349
11-24-2007, 05:37 PM
Long time lurker, first time poster.

So, having just come into possession of an Amazon Kindle, I thought I'd load it with some documents that I have in various text-based formats (DocBook XML being chief among them). I read that it supports the Mobipocket format, and being somewhat adept with Perl, I figured I'd whip up some conversion software with the help of CPAN.

What follows is a tale of horror such as you can't imagine.

Just for a lark, I started with the following:

$ strings KindleUsersGuide.azw

and got this:

Kindle_Users_Guide
BOOKMOBI
MOBI
EXTH
08/01/2007
Amazon.com
Amazon.com
REF000000
Reference
Q<P>An overview of all the Amazon Kindle features and how to use them.</P>
Kindle User's Guide
<html><head><guide><reference
itle="Table
ontents"
toc"
ilepos=0
232 /
.Welcome
start
8616
{snip}

No worries, I thought, it's not a ZIP or a GZIP archive, so they're probably working their own proprietary mojo with some kind of compressed container format. I can see strings up top that look like they're pretty clearly identifiers of some kind (Kindle_Users_Guide, BOOKMOBI, MOBI, and EXTH). I wasn't all that enthusiastic about reverse-engineering somebody's proprietary binary file format, so I visited Mobipocket's web site to look for a document specification.

That was my first mistake.

Nowhere do the Mobipocket people actually give you the secret sauce for their file format. No c code examples, no header or binary structure descriptions, nothing. Their Windoze-only "Mobipocket Creator," despite being marketed as "free software," is anything but -- I almost wish the FSF had a trademark on that term so they could do a legal beatdown on anyone who calls their software "free" just because it doesn't cost anything.

So, no help whatsoever from the Mobipocket crowd. I did discover, though, while browsing their forums that file extensions "prc" and "pdb" are synonyms for "mobi". So I Googled those, thinking that maybe someone somewhere had already done my homework for me.

I knew something was wrong immediately when I was redirected to a bunch of Palm OS-related websites. Imagine my horror when I found out that the mobipocket document container is actually a Palm Database file, a monstrosity that stores everything in a bizarre nonstandard record structure instead of a nice friendly POSIX-compliant directory hierarchy. The sauce on the goose: it stores data in big-endian format because it was originally designed to be used by the very first Palm Pilots, which had Motorola 68000-series microprocessors in them. Wow.

Thankfully CPAN has modules for everything, so I fired up Palm::PDB and Palm::Doc, almost hoping that they wouldn't be able to parse the file. However, they didn't have any problems groking the file structure, and my worst fears were realized.

These examples of rotten HTML are drawn from the finally decoded content of the Amazon Kindle Manual, which I grabbed off the device.

Let's start at the beginning:

<p width="0em"><font face="serif">Thank you for purchasing Amazon Kindle. You are reading the Welcome section of the <i>Kindle User's Guide</i>. This guide provides an overview of Kindle and highlights a few basic features so you can start reading as quickly as possible.</font></p>

After came to and peeled my face out of the keyboard I'd just spent five minutes banging it into, I glanced behind myself reflexively, half-expecting to see a blue police box or Billy Pilgrim or some other indication that I had been flung back in time to 1997.

The <font> element is one of those great horrors that we thankfully put to rest with HTML 4.0, XHTML 1.0, and CSS BEFORE THE END OF THE LAST CENTURY. So's the i tag. These are all examples of HTML 3.2-type mixing of document structure and formatting, which isn't supposed to happen under any circumstances in this day and age. You're supposed to use the style attribute along with the generic inline <span> container.

How am I supposed to convert into a format that doesn't even validate as HTML 3.2? How am I expected to use a monstrosity that doesn't conform to ANY of the ebook standards we've established over the last ten years?

The IDPF people have been working on these problems for ages. They came up with a bunch of specifications years ago that would have prevented this nightmare. But this was like the greatest hits of Netscape 2.0. I saw the <center> tag. I saw <li> tags that weren't closed. I saw illegal entities like &. Craploads of tags had the wrong punctuation for their closers (ie, <h4></H4>. Picture references didn't comply with Dublin Core or anything even close to standard. Hell, I half-expected to run into <blink> and <marquee>.

I could not believe I was looking at document markup from the user guide to a device that's supposed to be bleeding-edge.

If someone on this forum is from Mobipocket, I want to know how in good conscience you can continue to use a completely proprietary document container and HTML that looks like I wrote it back in 5th grade. To everyone else I recommend in the strongest possible terms that this format be avoided wherever possible.

I really, really hope that Amazon adds .epub support to the Kindle sometime soon. I already tried loading a document in that format on the device but was told it's unsupported. Otherwise you really are going to have to rely on Amazon for all your content, and good luck using it anywhere else even without DRM.

Nate the great
11-24-2007, 05:47 PM
You know, a large number of people try to claw their eyes out when they first see the insides of a Mobipocket file. I'm glad it didn't happen to you. ;)

Welcome to MobileRead.

HarryT
11-24-2007, 05:50 PM
Why not just use the tools that the rest of us use to generate Mobi files - Book Designer or the Mobi Desktop Reader - instead of beating yourself up over something you have no control over?

Yes, Mobi files are Palm Resource (PRC) files; the "BOOKMOBI" string you found is what identifies the specific data contained in the file. "TEXTREAD" is an alternate that you'll find in other Mobi files.

Mobi is pretty much a de facto eBook industry standard - it's certainly available on far more devices than any other format, and has more books available than any other format. There's no point of having hysterics over the fact that it's a rather elderly file format; it won't change and you'll give yourself a heart attack :).

Steven Lyle Jordan
11-24-2007, 05:54 PM
Welcome, schmidt349!

Yeah, okay, Mobi is old. Really old. And they're not especially helpful to programmers and hackers, pretty much leaving them to their own devices. On the other hand, it has the benefit of being able to run on almost any platform imaginable, and already has readers for said platforms. Sometimes there's value to being "old."

For your purposes, you should still be able to do the job of converting... even if you have to make it a 2-step process and convert your DocBook to some other format (like Word DOC or HTML) that Mobipocket Creator, or some other established conversion software, can work from... right? Of was there something I'm missing in your post here (other than your obvious shock and indignation at Mobi)?

Glad you're trying out a Kindle... quite a few of us are experimenting with it, me getting my e-books onto Amazon, for instance. You'll have to let us know how well it reads the DocBook files (once you get them in there).

schmidt349
11-24-2007, 06:04 PM
Well, just a couple of comments, and please don't take them personally.

You can't justify the mobipocket format's intractability by saying "it supports old devices" or "it's a de facto standard." That's the same logic that Microsoft uses to keep us locked into their proprietary binary document formats (and now proprietary XML document formats). If there are industry standards that everyone else has agreed to it makes absolutely no sense not to follow them except if they're trying to monopolize the market by using standards lockout.

It's incidentally the same logic that gave us HTML hell back in the nineties. Neither Netscape nor Microsoft wanted to play by a common set of rules; instead they just did whatever they pleased with HTML as a standard, and by the end it really wasn't one.

The good news is that the Kindle uses the Netfront browser (v3.3) to render HTML. That should make it XHTML and CSS compliant, so if I fiddle things properly I may be able to find a way to go Docbook via XSLT -> XHTML/CSS. Wish me luck.

tompe
11-24-2007, 06:19 PM
Why not just use the tools that the rest of us use to generate Mobi files - Book Designer or the Mobi Desktop Reader - instead of beating yourself up over something you have no control over?


Oh, they have released a Linux version of these tools then. Or are you just assuming that everybody uses Windows?

schmidt349
11-24-2007, 06:42 PM
Yeah, that was a sticking point for me as well-- I don't do Windows, it gives me heartburn.

Hadrien
11-24-2007, 06:59 PM
They're still supposed to release mobigen on Linux, sometime in the future, in a galaxy far far away...

jasonkchapman
11-24-2007, 08:00 PM
You can't justify the mobipocket format's intractability by saying "it supports old devices" or "it's a de facto standard."

I don't think anyone's trying to justify the format. It's just that your complaints are a decade old. Everyone knows what a horror the format is internally.

From a technological view, the format is a travesty. From a marketing view, it's the closest thing to a de facto standard that there is in the commercial e-book market. From a technological view, Amazon probably should developed a new format, or better, worked from an existing open standard. From a marketing view, Amazon's use of the Mobipocket format at least gives the hint of a promise of interoperability in the future.

Personally, I'm willing to bet that to Amazon's target market, the ones who are going to decide if e-books really matter or remain a niche market, things like HTML, XML, OEBP, etc. are just strings of meaningless technobabble.

igorsk
11-24-2007, 09:42 PM
I'm thinking they're just following the good old rule: "If it ain't broken, don't fix it (http://www.everything2.com/index.pl?node=if%20it%20ain't%20broke%2C%20don't%2 0fix%20it)".

Steven Lyle Jordan
11-25-2007, 12:22 AM
You can't justify the mobipocket format's intractability by saying "it supports old devices" or "it's a de facto standard." That's the same logic that Microsoft uses to keep us locked into their proprietary binary document formats (and now proprietary XML document formats). If there are industry standards that everyone else has agreed to it makes absolutely no sense not to follow them except if they're trying to monopolize the market by using standards lockout.

It's incidentally the same logic that gave us HTML hell back in the nineties. Neither Netscape nor Microsoft wanted to play by a common set of rules; instead they just did whatever they pleased with HTML as a standard, and by the end it really wasn't one.

I wouldn't compare Mobi's creation and development with the Browser Wars... in that case, there was a standardized format that two companies corrupted for their own purposes. In Mobi's case, there was no accepted e-book standard, and they were like every other e-book publisher, creating their own standard based on what they felt was needed.

No, there was nothing right about those publishers creating their own e-book format. They, and all the other e-book makers, could have gone with the closest thing we had to a standard universal format at the time--PDF--and be done with it.

It would have been nice if Amazon had come out with the Kindle about a year from now, based on ePub, the closest thing we have to a new universal XHTML-based format.

But the Kindle came out now, and right now, Mobipocket is the most widely used and widely available e-book standard there is, a no-brainer for a company that wants to sell an e-book reader. You can't blame Amazon for going for the existing and dominant format, instead of creating their own or going with a brand new, untested format... it's enough that they're getting into the hardware business, it's too much to have them turn into programmers, too.

I think we're all hoping that a sensible universal format, like ePub, becomes ubiquitous in the e-book arena. If so, ePub documents will be able to be converted to Kindle/Mobi (and any other format) easily enough.

In the meantime, all we can do is take the tools we're given, and figure out the best way to use them... or, if they are too distasteful, to opt out of using them at all. In other words, that fight is already over, it's time to pick a new battle.

wallcraft
11-25-2007, 02:04 AM
I would like to see an open source alternative to mobigen.exe, but the original may be "good enough". It takes an OEB document (html with a OPF file), or plain HTML, and produces a MOBI file. So the way to produce MobiPocket books is the same as for several of the older e-book formats - start with an actual OEB document and then convert it to the required bastardized commercial OEB variant.

MobiPocket has consistently put all its Desktop resources into Windows, so mobigen.exe is Windows only. My guess is that it would take about a day for someone with its source code to get it working natively under Linux and OSX, but I perhaps it is incredibly Windows-centric (i.e. it might take a week to port) and perhaps MobiPocket only has Windows programmers.

In any case, it is a command-line program that runs using wine under Linux (and probably also using wine under OSX). Wine isn't that hard to install and use under Linux, I don't know how well it works under OSX.

For an earlier related discussion (with many links), see MobiPocket TOC using mobigen (http://www.mobileread.com/forums/showthread.php?t=14911)

schmidt349
11-25-2007, 02:51 AM
The problem is not that it's hard to write a Mobipocket parser for any given platform. Using XUL to render the content and grep/XSLT to transform and display the old Mobipocket content seamlessly would take me all of a week. I've already written a Perl program that "unzips" Mobipocket files into a directory structure and rewrites all the nonstandard links in the document so you can just view it in a Web browser, provided of course you can find one that speaks HTML 2.0 (hint: Safari doesn't, at least not properly).

The problem lies in the fact that everyone seems to think it's alright to continue to author content in a format that has enormous limitations tied to the fact that it relies on a firmware data format that was never intended to be used the way it is now.

Seriously, think about it. Would you support an ebook format that wrapped data in an Apple II disk image or a Super Nintendo ROM? If you use .mobi you're doing something exactly parallel.

What's worse, the compression format they use is some ancient undocumented Palm thing; the only reason my program can read it at all is because of the work of a kind soul on CPAN who wrote a Perl script that can decode and parse it. Without that it would probably have taken me weeks to write a specification and implement it properly. I'm a busy man; I don't have 40-hour weeks to spend staring at Mobipocket's idea of a joke.

The .epub format is trivial to write an interpreter for. It just has XHTML documents for the text itself, support files (CSS, JS, images in JPEG and GIF), and an easy-to-read XML manifest for the whole thing, all of it wrapped in a ZIP container. All totally industry-standard and the very same stuff we've been running the Web on since 2000.

So why hasn't Amazon pledged to support it in the Kindle?

What I really wanted to do was convert public-domain XML/SGML versions of ancient Greek and Latin texts into a format that the Kindle could understand so I don't have to carry irreplaceable books around with me. The former's been nixed by the Kindle's complete lack of UTF-8 support (precipitated in part, I shouldn't wonder, by the limitations of the Palm database format) and the latter just doesn't seem worth the effort considering that the conversion would deadend.

HarryT
11-25-2007, 04:49 AM
The problem lies in the fact that everyone seems to think it's alright to continue to author content in a format that has enormous limitations tied to the fact that it relies on a firmware data format that was never intended to be used the way it is now.

It's not a matter of thinking that it's alright; it's a purely practical matter of saying "this is the de facto industry standard and it's what we have to live with, like it or not."

HarryT
11-25-2007, 05:30 AM
Oh, they have released a Linux version of these tools then. Or are you just assuming that everybody uses Windows?

Heavens no! My only assumption is that anyone to whom the creation of Mobi books is important will have equipped themselves with the tools to create them, whether that's by running natively under Windows, running mobigen under Wine, or whatever.

Creating my own books is absolutely vital to me, so I'd never buy a book reader which didn't allow me to create my own books. I don't run Linux (I have Macs and Windows machines) so I, for example, wouldn't buy any reader whose tools ran only under Linux. I'm assuming it's equally true that someone who didn't run Windows wouldn't buy a reader whose creation tools ran only under Windows. That seems pretty reasonable to me!

You have a CyBook Gen3, don't you? How do you create books for it?

tompe
11-25-2007, 07:49 AM
I would like to see an open source alternative to mobigen.exe, but the original may be "good enough". It takes an OEB document (html with a OPF file), or plain HTML, and produces a MOBI file. So the way to produce MobiPocket books is the same as for several of the older e-book formats - start with an actual OEB document and then convert it to the required bastardized commercial OEB variant.


I took your Alice example from the other thread and strangely enough reading in the HTML file annd writing a mobi file using the Perl packages gave me a file were the TOC worked in FBReader. I wonder how it could work. The TOC did not work on my Gen3 but I could read the file.

This was kind of fun so I will play around with it and see what you can easily do.

tompe
11-25-2007, 07:53 AM
Creating my own books is absolutely vital to me, so I'd never buy a book reader which didn't allow me to create my own books. I don't run Linux (I have Macs and Windows machines) so I, for example, wouldn't buy any reader whose tools ran only under Linux. I'm assuming it's equally true that someone who didn't run Windows wouldn't buy a reader whose creation tools ran only under Windows. That seems pretty reasonable to me!

You have a CyBook Gen3, don't you? How do you create books for it?

I knew that HTML would work so I did not check before I bought the Gen how to create books in mobi format. I have tested running mobigen with wine and it kind of works but I do not like to run programs that I do not have the source for. Now I have managed to generate books using Perl so maybe I do not have to use mobigen anymore.

tompe
11-25-2007, 07:56 AM
What I really wanted to do was convert public-domain XML/SGML versions of ancient Greek and Latin texts into a format that the Kindle could understand so I don't have to carry irreplaceable books around with me. The former's been nixed by the Kindle's complete lack of UTF-8 support (precipitated in part, I shouldn't wonder, by the limitations of the Palm database format) and the latter just doesn't seem worth the effort considering that the conversion would deadend.

Does it not work coding the characters using entities? This works with mobigen and my Gen3 for some entities at least.

Steven Lyle Jordan
11-25-2007, 09:04 AM
The problem lies in the fact that everyone seems to think it's alright to continue to author content in a format that has enormous limitations tied to the fact that it relies on a firmware data format that was never intended to be used the way it is now.

Well, not really. It's not that we think Mobi is anything great. But right now, Mobi is the bird in the hand, while ePub is the two birds in the bush. It's that simple. There are plenty of us here who'd love to see ePub become the defacto standard in e-books, hopefully soon.

So why hasn't Amazon pledged to support (ePub) in the Kindle?

We're pretty sure that Amazon:

Wanted to move on e-books now;
Didn't want to wait for ePub to become a standard;
Is presently unsure about the new format, and whether it will in fact be adopted by anyone else; and
May be too concerned about content lock to want to delve into a universal format anyway.


It would've been nice if Amazon had taken the initiative to drive ePub, but again, they are a commercial entity, devoted to profit, and they clearly made the decisions that they expect will get them the most profit they can.

What I really wanted to do was convert public-domain XML/SGML versions of ancient Greek and Latin texts into a format that the Kindle could understand so I don't have to carry irreplaceable books around with me.

That's great! Taking that kind of initiative to optimize the Kindle (or any e-book reader) for your use is just what we like to hear around here!

igorsk
11-25-2007, 09:04 AM
Mobi does support UTF-8 (-unicode switch of mobigen). I'm pretty sure Kindle can read those, but it's possible that the bundled fonts do not have Greek characters.

JSWolf
11-25-2007, 09:19 AM
I'm thinking they're just following the good old rule: "If it ain't broken, don't fix it (http://www.everything2.com/index.pl?node=if%20it%20ain't%20broke%2C%20don't%2 0fix%20it)".
But to be honest, it is broken. if you have an eink device with a 6" screen or even an iLiad with a larger screen, you can find some of the Mobipocket format books to be totally useless. Mobipocket is a format created originally for PDA sized screens. if you look at Mobi files that contain images, you will find most of the images are tiny. In some cases, too small to be of any use on a larger screen. If these images are important for the book you are trying to read, you will find the book to be useless in that case as the images will be small and possibly fuzzy. I've seen this problem and the thing to do is to purchase MS Reader format books with images and convert.

HarryT
11-25-2007, 09:31 AM
Mobi does support UTF-8 (-unicode switch of mobigen). I'm pretty sure Kindle can read those, but it's possible that the bundled fonts do not have Greek characters.

Latin and Greek texts are of great interest to me, too, as you may have seen from some of my past postings.

Latin is no problem, but for ancient Greek you really need to use a format such as PDF which supports embedded fonts, and embed a suitable font into the document. Unicode supports modern Greek, but not the accents and breathing marks required for ancient Greek.

Mobi isn't a good choice of format for ancient Greek.

akiburis
11-25-2007, 09:46 AM
Unicode fully supports ancient Greek, of course. What Mobi supports is a different matter.

HarryT
11-25-2007, 09:57 AM
Unicode fully supports ancient Greek, of course. What Mobi supports is a different matter.

But the typical user is highly unlikely to have a font with the appropriate characters in it. What I meant was that the only way to guarantee that the text will be readable by the end user is to embed the font in the document and (AFAIK) Mobi doesn't support font embedding, unlike PDF (and also the Sony Reader too).

tompe
11-25-2007, 10:40 AM
You know, a large number of people try to claw their eyes out when they first see the insides of a Mobipocket file. I'm glad it didn't happen to you. ;)


I think I am missing something here. I unpacked some Mobipocket file from Manybooks and the HTML looked OK so I assume the HTML do not have to be bad. Then I created a Mobipocket file which contained external references to image files. And in FBReader that worked. But did I really have a Mobipocket file then?

So when Bookeen says that Gen3 supports Mobipocket how do I know what html will work on Gen3? Will things not in the Mobipocket format work if the Gen3 reader happens to understand the unpacked html file?

What I am wondering is if you can take any html file and pack it into the container and have a Mobipocket file or do you have to convert the HTML first to some Mobipocket HTML?

Anyway, my html2mobi script can now take a list of HTML files and automatically generate a table of content and then create one HTML file that is then converted to a mobi file. The next step is to exetend this to follow links to files so I can convert a tree fetched with wget.

wallcraft
11-25-2007, 11:16 AM
The .epub format is trivial to write an interpreter for. It just has XHTML documents for the text itself, support files (CSS, JS, images in JPEG and GIF), and an easy-to-read XML manifest for the whole thing, all of it wrapped in a ZIP container. All totally industry-standard and the very same stuff we've been running the Web on since 2000.

So why hasn't Amazon pledged to support it in the Kindle? EPUB also allows "images" in SVG format, which is perhaps pushing the "Web 2000" envelope a bit (but a good addition even so).

Amazon/MobiPocket appears to intend to support EPUB the way they currently support CHM (say). It will be imported into Windows MobiPocket Reader (only) and converted to MOBI. The Kindle could do a similar conversion on Amazon's servers. This is technically a terrible approach, even if MobiPocket upgrades the MOBI format to provide more EPUB compatibility. However, it is understandable from MobiPocket's perspective. When you have support for multiple device types, how do you switch them to a new format? The only possible answer (given limited resources, and an existing code base that probably was not designed for extendability) is first to convert to the old format and second to upgrade the software on each device (one by one) to read the new format natively. I just hope that MobiPocket will get to step two.

The sluggishness of Amazon may provide an opportunity for others. For example, if ETI could productize an e-ink reader that reads EPUB like their prototype (http://www.mobileread.com/forums/showthread.php?t=13786) apparently does then they might gain a significant advantage. Even Adobe might have a chance if they ever work out how to design a reader interface for Digital Editions that does not suck.

jbenny
11-25-2007, 11:29 AM
But the typical user is highly unlikely to have a font with the appropriate characters in it. What I meant was that the only way to guarantee that the text will be readable by the end user is to embed the font in the document and (AFAIK) Mobi doesn't support font embedding, unlike PDF (and also the Sony Reader too).

epub supports font embedding, also. When the Sony has Digital Editions and when the Cybook supports epub, ancient Greek will be no problem.

HarryT
11-25-2007, 11:37 AM
epub supports font embedding, also. When the Sony has Digital Editions and when the Cybook supports epub, ancient Greek will be no problem.

It's not a problem on the Sony as it is, because Sony's LRF format supports font embedding. I uploaded an ancient Greek version of book 1 of Homer's "Odyssey" to the "Book Uploads" section a few months ago as an example of this.

akiburis
11-25-2007, 12:57 PM
Here's a point I'd like to make, even if it's not entirely relevant to this thread. Why the endlessly reiterated insistence, in these forums, that PDF is an awful, horrible format for ebooks? The LRF format may support font embedding, but the applications for generating LRF output (I say as a bystander, not having given one a good try myself) seem to have severe typographic limitations. So I think PDF support is a necessity in an ebook (or etext) reading device, unless all you want is something like what I take the Kindle to be, a sort of pricey, branded shopping bag for commercial ebooks in this or that currently popular crippled format.

schmidt349
11-25-2007, 01:02 PM
Latin and Greek texts are of great interest to me, too, as you may have seen from some of my past postings.

Latin is no problem, but for ancient Greek you really need to use a format such as PDF which supports embedded fonts, and embed a suitable font into the document. Unicode supports modern Greek, but not the accents and breathing marks required for ancient Greek.


PDF isn't a good choice of format for anything except replicating paper documents, which is a fine application and very useful but not so much so for e-texts. If you use JSTOR to get your journal fix you know what I mean: they do a really phenomenal job of replicating exactly what you'd get if you had access to the journals in print, but you can't really do anything cool and electronic with them. Reflowing, for instance, is key for electronic distribution, for which reason the "page" as a milestone has to go away.

Many journals, especially in the sciences, use TeX for typesetting, so there's a very easy and rapid path to reflowable text for them.

The Perseus project just made all of their public-domain texts available in XML, which is the academic standard for manipulation of documents. As an interchange format you really don't get much better. The bottom line is that I need to transform these documents via XSLT into something suitable for e-readers, but the Kindle throws up way too many roadblocks.

Most of the big document databases (see a big listing here (http://www.lib.uchicago.edu/e/ets/efts/ARTFL.html)) use XML for document interchange for the best of reasons. So what you have here is the e-book world being completely divorced from the academic world, which is probably not a good means for ensuring its long-term survivability.

Sigh.

schmidt349
11-25-2007, 01:04 PM
It's not a problem on the Sony as it is, because Sony's LRF format supports font embedding. I uploaded an ancient Greek version of book 1 of Homer's "Odyssey" to the "Book Uploads" section a few months ago as an example of this.

Which edition? Allen? If so it's not terribly useful to anyone except the amateur (which is great, by the way!).

Where'd you get it, by the way? You didn't keyboard it yourself, did you? That would have been a little hard on the wrists. :huh:

schmidt349
11-25-2007, 01:07 PM
Here's a point I'd like to make, even if it's not entirely relevant to this thread. Why the endlessly reiterated insistence, in these forums, that PDF is an awful, horrible format for ebooks? The LRF format may support font embedding, but the applications for generating LRF output (I say as a bystander, not having given one a good try myself) seem to have severe typographic limitations. So I think PDF support is a necessity in an ebook (or etext) reading device, unless all you want is something like what I take the Kindle to be, a sort of pricey, branded shopping bag for commercial ebooks in this or that currently popular crippled format.

It's not reflowable, it's impossible to index properly, screen readers can't do anything with it, and it's really really hard to cache multiple pages rapidly unless you have some serious processing oomph at your back. Hence it's not really suitable for electronic book devices.

The horrible, horrible workaround to some of these problems that some outfits have been using is to back the page images with a dirty OCR. It doesn't work. At all.

Steven Lyle Jordan
11-25-2007, 01:12 PM
Here's a point I'd like to make, even if it's not entirely relevant to this thread. Why the endlessly reiterated insistence, in these forums, that PDF is an awful, horrible format for ebooks?

In a nutshell: PDF has had two main issues related to e-books.

1. PDFs are originally and chiefly designed to maintain the formatting of a document for printing. That means it tends to be a large file, especially when images are involved. The first e-book readers and handheld devices were severely limited in storage space and RAM, and PDFs were almost impossible to read on many devices. As Acrobat has developed further, PDFs have become bloated documents that tend to bog down even the best reading devices short of a full PC.

2. PDF is set to a particular size, usually letter or A4, which is rarely matched by an e-book reader's screen size. Some devices, like Windows-based handhelds can reflow and resize the text in a PDF to fit the smaller screen size. But many other devices cannot reflow or resize PDF text. As a result, your PDF page is either a postage stamp too tiny to read when it is "fit" onto the page, or it must be scrolled left-to-right, then down, to read every line.

The second point is most important these days. Generally what you find is that someone with a dedicated e-book reader must prepare their own PDF files from the original document, at the size specific to their device, in order to make it readable. (I now offer my e-books in RTF, for instance, to facilitate this process for those who desire to do so.) But most e-book reading people try to avoid the hassle of prepping and using PDFs for e-book reading.

HarryT
11-25-2007, 02:37 PM
Which edition? Allen? If so it's not terribly useful to anyone except the amateur (which is great, by the way!).

Where'd you get it, by the way? You didn't keyboard it yourself, did you? That would have been a little hard on the wrists. :huh:

Good heavens no :). It's a version downloaded from Perseus. I am strictly an amateur classicist, and these days rely primarily on on-line sources for my texts (mainly Perseus for Greek, The Latin Library for Latin).

Steven Lyle Jordan
11-26-2007, 01:28 PM
But to be honest, it is broken. if you have an eink device with a 6" screen or even an iLiad with a larger screen, you can find some of the Mobipocket format books to be totally useless. Mobipocket is a format created originally for PDA sized screens. if you look at Mobi files that contain images, you will find most of the images are tiny. In some cases, too small to be of any use on a larger screen. If these images are important for the book you are trying to read, you will find the book to be useless in that case as the images will be small and possibly fuzzy. I've seen this problem and the thing to do is to purchase MS Reader format books with images and convert.

This doesn't mean Mobi is "broken," just that larger devices should be using a format better suited for it... like ePub.

Hadrien
11-26-2007, 01:33 PM
Lack of stylesheet support and embedded fonts is a pretty huge problem for an e-book format.

I wonder if all those publishers who created e-books for the Kindle created directly Mobipockets versions or epub versions of these books. If Amazon asked for Mobipocket instead of epub, this is a complete waste of time for the publishers...

Steven Lyle Jordan
11-26-2007, 01:41 PM
I wonder if all those publishers who created e-books for the Kindle created directly Mobipockets versions or epub versions of these books. If Amazon asked for Mobipocket instead of epub, this is a complete waste of time for the publishers...

I know I've uploaded my books to Kindle in Mobi format, and the only thing I changed was the size of the cover (larger). However, they'll also accept Word Doc and HTML, and possibly other formats, so you're not limited to Mobi. I don't know that Amazon will accept ePub files, but I've seen no reference to them on the site.

Of course, once Amazon has them, they are converted to HTML, so you have the makings of an ePub file anyway.

wallcraft
11-26-2007, 01:53 PM
I wonder if all those publishers who created e-books for the Kindle created directly Mobipockets versions or epub versions of these books. MobiPocket Creator uses OEB as its base format and only produces a MOBI file once the e-book is completed. Similarly, mobigen.exe can take the .opf file from a OEB e-book as its starting point. I don't think major publishers are likely to use Creator exclusively, but it may be at the end of their production chain for MOBI e-books. A MOBI e-book with JPEG images is essentially an AZW e-book, so that may be the "best" upload option to Amazon.

JSWolf
11-26-2007, 02:20 PM
This doesn't mean Mobi is "broken," just that larger devices should be using a format better suited for it... like ePub.
But why support a format that is designed for small screens when you have a larger screen?

Steven Lyle Jordan
11-26-2007, 03:21 PM
But why support a format that is designed for small screens when you have a larger screen?

Well, you've got me there. I'm sure Amazon wanted an existing format (less trouble), and no other formats satisfied them. Maybe others might have worked better, such as MS Reader, LRF, or even PDF, but licensing them from MS, Adobe or Sony would have been too expensive/too much hassle.

I can only guess Mobi was the most convenient ready-made format that they could obtain and use quickly, plain and simple.

Hadrien
11-26-2007, 03:35 PM
Creating an epub file and a Mobipocket file is quite different from a publisher point of view.

For the epub version, they can embed and use fonts like on a real book, add some extra formatting and support a lot more metadata.

That's the main reason why it's much better if publishers did create their e-books using epub, and then Amazon converted these files to Mobipocket. Part of the information is lost during this process, but at least these books are ready for epub or any other format.

Steven Lyle Jordan
11-26-2007, 04:36 PM
That's the main reason why it's much better if publishers did create their e-books using epub, and then Amazon converted these files to Mobipocket. Part of the information is lost during this process, but at least these books are ready for epub or any other format.

I think that logic goes pretty much for all publishers and e-book formats.

igorsk
11-26-2007, 04:38 PM
I don't think Mobi was specifically "designed" for small screens. It's just where it was mostly used and that's why most mobi files tend to include "safe-sized" pictures. If you include bigger pictures, it will look fine on bigger screens too.

JSWolf
11-27-2007, 07:39 PM
If you have a Mobi format book with say images sized for a 6" eink screen and you try to read this book on a PDA, will the images bee too large or will they be resized for the smaller screen?

wallcraft
11-27-2007, 07:52 PM
MobiPocket's documentation indicates that large images will be resized. I confirmed that this was the case if you made the window very small on a PC, but I don't have a PDA to test on. See Images in MobiPocket (http://www.mobileread.com/forums/showthread.php?t=14641).

A related issue is which devices support JPEG images in MOBI files, since these are typically needed to get large images. The "StandMars" PRC files in the above thread should work on the Kindle (mine arrives tomorrow), and can test MobiPocket's PDA reader software for JPEG support.

schmidt349
11-27-2007, 11:38 PM
There's a built-in four-gigabyte size limit to the PDB, but I doubt that's going to be a limitation to most books going forward. Probably the constraining factor is the 65,536 maximum number of records (data packets, quasi-files; it seems as though Mobipocket splits biggish files into several of these, possibly for faster indexing) in the PDB.

The format doesn't seem to impose any inherent limit on the size of each record, though.

wallcraft
11-28-2007, 12:01 AM
I realized I now have the Palm MobiPocket Reader on my Nokia 770, and it seems to handle large JPEG images ok - rescaling them to fit the screen. Note that the Palm window is only 320x480 (total screen is 800x480).

DaleDe
11-28-2007, 12:28 PM
Lack of stylesheet support and embedded fonts is a pretty huge problem for an e-book format.

I wonder if all those publishers who created e-books for the Kindle created directly Mobipockets versions or epub versions of these books. If Amazon asked for Mobipocket instead of epub, this is a complete waste of time for the publishers...

I believe that have stylesheets but not CSS. XLS perhaps? (Don't remember the acronym)

Dale

Steven Lyle Jordan
11-28-2007, 05:26 PM
I wonder if all those publishers who created e-books for the Kindle created directly Mobipockets versions or epub versions of these books. If Amazon asked for Mobipocket instead of epub, this is a complete waste of time for the publishers...

Just today I uploaded 2 more books to Amazon, in the original Word DOC format, and they worked fine (based on the previews). Amazon gives you a choice of formats to upload and convert, though I don't know how many formats they can handle.

JSWolf
11-28-2007, 06:12 PM
Since the Mobi software resizes the images why the heck do they not make the books with large images and let the software size as needed? This makes no sense an actually ruins a number of books that need viewable images. May Mobipocket rot for that.

Steven Lyle Jordan
11-29-2007, 12:20 AM
Since the Mobi software resizes the images why the heck do they not make the books with large images and let the software size as needed? This makes no sense an actually ruins a number of books that need viewable images. May Mobipocket rot for that.

Actually, Mobi files will resize large images down (like, to a PDA). I believe the only real issue there is the increase in file size that you get with big photos, something that older PDAs and other devices were known to choke on.

kovidgoyal
11-29-2007, 01:31 AM
Since the Mobi software resizes the images why the heck do they not make the books with large images and let the software size as needed? This makes no sense an actually ruins a number of books that need viewable images. May Mobipocket rot for that.

The more important question is why the heck are Bokeen, Amazon and Irex all using it is their primary ebook format?

DaleDe
11-29-2007, 02:17 AM
The more important question is why the heck are Bokeen, Amazon and Irex all using it is their primary ebook format?

Because images are not the primary purpose of novels? Mobi files can be made to have larger images but it takes some effort.

Dale

kovidgoyal
11-29-2007, 02:41 AM
It has various other problems...I've listed them before.

JSWolf
11-29-2007, 02:53 AM
Because images are not the primary purpose of novels? Mobi files can be made to have larger images but it takes some effort.

Dale
Some books need images to be clearly viewable. if the images are too small, the book is totally ruined. Try telling that to someone who just paid good money for a book that is useless because of old technology.

wallcraft
11-29-2007, 10:30 AM
MobiPocket has taken the approach of fixing the small image problem by switching to JPEG images, but that leaves all the existing e-books with tiny GIF images (and many new ones as well, because publishers are slow to switch - in part because of compatibility concerns). MobiPocket should add a fix for legacy e-books, by scaling up small images. This won't be perfect, but better on large screens than leaving them unscaled.

Part of the problem is that the MobiPocket Java Reader software used by the iLiad and Kindle is the least developed of all MobiPocket's Readers. Other areas where MobiPocket's Java-based Reader is behind the times: can't show large images at their original size, can't set margins, can't change line spacing, can't change fonts (on some devices). The Cybook's reader probably isn't the Java version, and it does have more features. Perhaps Bookeen could take the lead and demonstrate that scaling up small images is a worthwhile thing to do.

HarryT
11-29-2007, 11:16 AM
The non-eInk versions of the Mobi Reader have an "image mode" which allows you to zoom in on, and (on a PDA) pan around an image. Works very well for things like maps.

JSWolf
11-29-2007, 11:17 AM
The problem is that some of these small images look awful when resized. 8-bit gif images do not always size well. I used Photoshop to size some images pulled from a Mobipocket ebook and they were not good. So yes it can be done. Will you like it? Probably not. The only way to solve the problem is to go back and redo all the books that have images besides the cover. The best solution would be a reworking of the reader so it's up to the standards of todays larger screens.

HarryT
11-29-2007, 11:18 AM
Because images are not the primary purpose of novels? Mobi files can be made to have larger images but it takes some effort.

Dale

And the Mobi Reader does have some very nice features which others lack, such as dictionary lookup.

DaleDe
11-29-2007, 11:50 AM
Some books need images to be clearly viewable. if the images are too small, the book is totally ruined. Try telling that to someone who just paid good money for a book that is useless because of old technology.

Certainly that is true and I never said it was not. I was just explaining the position, not defending it. Images in novels are not frequent but other books depend heavily on it. Not all ebook formats work well with all genres. Try manga in a traditional reader.

Dale

Steven Lyle Jordan
11-29-2007, 11:53 AM
Try manga in a traditional reader.

Has anyone tried manga on the Kindle? Just curious.

JSWolf
11-29-2007, 11:55 AM
Has anyone tried manga on the Kindle? Just curious.
I've seen photos of manga on a 505. Given that it's the same screen, I would say manga would work quite well on any reader that uses the 6" Vizplex screen. But, the 505 does have the advantage of 8 shades vs. 4 for less dithering and a nicer image.

Steven Lyle Jordan
11-29-2007, 12:09 PM
It has various other problems...I've listed them before.

Right. I'm sure the decision was simply based on commercial/financial, not technical, issues.

tompe
11-29-2007, 04:37 PM
MobiPocket has taken the approach of fixing the small image problem by switching to JPEG images, but that leaves all the existing e-books with tiny GIF images (and many new ones as well, because publishers are slow to switch - in part because of compatibility concerns). MobiPocket should add a fix for legacy e-books, by scaling up small images. This won't be perfect, but better on large screens than leaving them unscaled.

MobiPockets demoexample has an gif that is 600x800 and that works OK on smaller deviced. Do you mean that jpg enabled use of larger images than 600x800?

wallcraft
11-29-2007, 04:54 PM
The size in pixel's isn't the issue, it is the image size in KB that matters. For full color images, the average size of a 64 KB GIF is about 200x300 - but this size is highly variable and some 600x800 GIFs will fit in 64 KB. By default, the mobigen.exe program reduces the size of GIF images so the they fit in 64 KB by shrinking the image. If you use mobigen.exe -jpeg (and JPEG images), it reduces the size in bytes of the image by reducing its quality while maintaining its size in pixels. Since JPEGs are intrinsically more efficient than GIFs for many images, the 64 KB limit is less onerous for JPEGs.

I have screenshots showing the difference between LIT (unlimited size JPEGs) and PRC (limited byte size GIF/BMP) at the start of Images in MobiPocket (http://www.mobileread.com/forums/showthread.php?t=14641). If you use mobigen.exe -jpeg on the LIT file you get essentially the same result as the original, and if you use mobigen.exe (without the -jpeg) on the LIT file I expect you would get essentially the same result as Baen's original PRC (very small images).

jharker
11-30-2007, 07:08 PM
The more important question is why the heck are Bokeen, Amazon and Irex all using it is their primary ebook format?
I'm not sure I would say that mobi is the iRex iLiad's primary format. The iLiad only got support for mobi fairly recently, and only (I assume) because it's the de facto ebook standard and iRex wants to support it.

The iLiad's primary format is pdf -- it was supported since day one. The iLiad can't reflow pdfs, but with its larger screen and stylus navigating letter/A4-sized pdfs is not too difficult, and they display pretty well.

kovidgoyal
11-30-2007, 08:09 PM
I'm not sure I would say that mobi is the iRex iLiad's primary format. The iLiad only got support for mobi fairly recently, and only (I assume) because it's the de facto ebook standard and iRex wants to support it.

The iLiad's primary format is pdf -- it was supported since day one. The iLiad can't reflow pdfs, but with its larger screen and stylus navigating letter/A4-sized pdfs is not too difficult, and they display pretty well.
A good point, unfortunately, encouraging people to keep ebooks in pdf format is even worse than encouraging them to use .mobi

wallcraft
11-30-2007, 08:48 PM
The Kindle is using MOBI because Amazon owns the format. The reason that the iLiad and Cybook are using MOBI is that there are no alternatives. No other vendor of DRMed e-books yet has a Linux Reader, and the only other announced Reader is Adobe Digital Editions which is "coming soon" for Linux.

In defense of MOBI, I would say that it is primarily a format for displaying e-books - rather than a format for laying out and archiving e-books. Even in this role it has limitations. Its real test will be whether it can adapt to displaying epub e-books.

jharker
11-30-2007, 09:58 PM
A good point, unfortunately, encouraging people to keep ebooks in pdf format is even worse than encouraging them to use .mobi
Oh, yes, I completely agree. I didn't mean pdf is a good ebook format, I just meant that mobi isn't really the iLiad's "default" format.

Actually, it's kind of funny: the iLiad supported pdf first because it was originally designed for use by business people, who usually read documents in pdf format. iRex added mobi support later, probably as a response to the large number of users buying the iLiad for use as an e-book reader. But it wasn't designed for e-book reading in the first place... :D

In general I have to say that a modern, extensible, reflowable standard format for ebooks would be A Good Thing.

I wouldn't worry about mobi sticking around in the long term. Considering that the mobi format has no embedded font support, I expect we'll drop it as soon as e-books become popular in non-English languages...

schmidt349
12-02-2007, 01:53 PM
The reason that the iLiad and Cybook are using MOBI is that there are no alternatives.

EPUB, XHTML/CSS, etc. In fine, there are plenty of reflowable text formatting schemes that are a heck of a lot easier to implement than Mobi. At the very least these are formats that don't rely on big-endian byte ordering :-D

wallcraft
12-02-2007, 03:01 PM
I agree that MOBI isn't the easiest format to implement. Linux (or any O/S) would not be a issue if DRM was not in the picture. Only MobiPocket has Linux Reader software with DRM at present (Sony does too, but only on their own devices). This may be a marketing, rather than a technical, issue - but it is a real constraint on E-Ink devices.

HarryT
12-03-2007, 04:29 AM
I agree that MOBI isn't the easiest format to implement. Linux (or any O/S) would not be a issue if DRM was not in the picture. Only MobiPocket has Linux Reader software with DRM at present (Sony does too, but only on their own devices). This may be a marketing, rather than a technical, issue - but it is a real constraint on E-Ink devices.

I'm afraid I disagree that this was the prime motivating factor for iRex and Bookeen to go with Mobi. The prime reason, almost certainly, was the fact that Mobi are by far the market leaders in the eBook world - they have more stores and more books than anyone else. What would be the point in releasing an eBook reader which had no books available for it? By going with Mobi, they've given their customers a vast range of books to choose from.

The typical customer doesn't give two hoots if the file format is technically a poor one; all they care about is whether or not they'll be able to buy the books they want to read. Mobi gives them that.

schmidt349
12-03-2007, 10:24 AM
Yeah, until their DRM server crashes and doesn't come back up again this time so all your books are bricked.

HarryT
12-03-2007, 10:29 AM
Yeah, until their DRM server crashes and doesn't come back up again this time so all your books are bricked.

Mobi's server crashing would not have the slightest effect on existing books; it would simply prevent you from buying new books, or encoding books that you've bought previously for a new device.

Since Mobi have been around longer than virtually anyone else in the business, I don't personally regard this as something that's likely to happen any time in the foreseeable future.

Since I rarely re-read books, I regard this as a complete "non issue". I buy books to read NOW - I don't particularly care whether or not I'll be able to re-read them in 20 years' time. How many of the books that you bought 20 years ago do you still have, and read, now?

DaleDe
12-03-2007, 10:31 AM
Yeah, until their DRM server crashes and doesn't come back up again this time so all your books are bricked.

This is misinformation. They did have a problem with a security breach at one point and shut down the server to fix it but this did NOT effect anyones ability to use the books they already owned. It did effect the ability to buy books for a while. It was, hopefully, a one time event and they have taken efforts to prevent it in the future.

Dale

tompe
12-03-2007, 03:18 PM
This is misinformation. They did have a problem with a security breach at one point and shut down the server to fix it but this did NOT effect anyones ability to use the books they already owned. It did effect the ability to buy books for a while. It was, hopefully, a one time event and they have taken efforts to prevent it in the future.


If you are going to be entirely correct than this is misinformation if by bought you mean that you have payed for the book and it is downloadable. If you have to replace your reading device during the period there computer is not working you cannot read your books.

DaleDe
12-03-2007, 06:22 PM
If you are going to be entirely correct than this is misinformation if by bought you mean that you have payed for the book and it is downloadable. If you have to replace your reading device during the period there computer is not working you cannot read your books.

You talked like it was this major problem effecting all users drastically and now it is reduced to people who had a hardware failure and had to actually replace their reading device in the week that it was down. This was very unclear from you original posting.

Dale

tompe
12-03-2007, 09:49 PM
You talked like it was this major problem effecting all users drastically and now it is reduced to people who had a hardware failure and had to actually replace their reading device in the week that it was down. This was very unclear from you original posting.


I did not talk about anything. It was another person. I just think that if you say that something is misinformation and corrects it the correction should not contain any misinformation. As a general principle.

schmidt349
12-03-2007, 11:37 PM
Mobi's server crashing would not have the slightest effect on existing books; it would simply prevent you from buying new books, or encoding books that you've bought previously for a new device.

Great, so Mobipocket goes belly-up, my e-reader breaks a year later, I replace my PC, and THEN I'm SOL.

Compare my paper books, all of which I'm still able to read with as little difficulty as their original audiences even though some are more than a century old.

This will be a DRM good/bad-type debate, though, and offtopic here.

HarryT
12-04-2007, 03:21 AM
Mobi are, to repeat, one of the oldest names in the business. They are highly unlikely to go "belly up". It's a chance I'm willing to take. If you aren't, the solution is simple - don't buy books from them; nobody is forcing you to. We all have to make an informed choice about these matters.

DaleDe
12-04-2007, 12:27 PM
I did not talk about anything. It was another person. I just think that if you say that something is misinformation and corrects it the correction should not contain any misinformation. As a general principle.

sorry, When there is 6 pages of articles in the thread it is hard to keep track of who is saying what. I think I did quote what you said which seemed to me to be supporting the original premise. If you were not supporting the original premise then I misunderstood you.

Dale

Barry Scott
12-09-2007, 01:08 PM
Since I rarely re-read books, I regard this as a complete "non issue". I buy books to read NOW - I don't particularly care whether or not I'll be able to re-read them in 20 years' time. How many of the books that you bought 20 years ago do you still have, and read, now?

This is where I have a problem. I have books that are 20 years old and I like the fact I can read them again (and do). I am not interested in Sony's proprietary format as we know how that often works out for them (fingers crossed for Blu-ray though), nor I am confident in Amazon's commitment to MOBI, especially after they screwed all their existing customers who have spent money on DRM'd books (basically if you have spent hundreds of pounds on DRM'd MOBI books, buy them now on Kindle or [Expletive deleted] off).

If maybe they allowed the Kindle to read DRM'd MOBI files, that would show some commitment to the format but more importantly the customer. As it stands, I think Mobipocket would be more appealing to me if Amazon sold it, that way it could be controlled by someone who had not basically just released a replacement to it.

I think I will go with the Sony just because they have said they will support epub and digital editions. By supporting an alternative DRM'd format, I think this shows commitment to the consumer and I trust Adobe having used their products for a long time. If Bookeen announce a date for epub support and that they will support digital editions, then I might be tempted, but I think the Sony is the nicer device of the two.

macewan
12-26-2007, 01:26 PM
apt-get install pyrite-publisher
pyrpub file.txt
pilot-xfer -i file.pdb

Oh, they have released a Linux version of these tools then. Or are you just assuming that everybody uses Windows?

tompe
12-26-2007, 01:40 PM
apt-get install pyrite-publisher
pyrpub file.txt
pilot-xfer -i file.pdb

That generated PalmDoC files from text files and not MobiPocket files. But my Mobiperl described in another thread can now generate MobiPocket files on all platforms that has Perl installed.

macewan
12-26-2007, 01:50 PM
That generated PalmDoC files from text files and not MobiPocket files. But my Mobiperl described in another thread can now generate MobiPocket files on all platforms that has Perl installed.


ah, thanks for clarifying