MobileRead Forums - View Single Post

ixtab · 01-05-2013, 02:14 PM

Quote:

Originally Posted by HarryT

Performance is probably also better, because you can do queries without (or with fewer) table joins. A lot of my "day job" work is with databases, and we often end up with a certain degree of denormalisation for both the reasons I've mentioned.

Sorry, I only read this part of your reply now.

Yes, that's a valid point, and denormalization does make sense at times. But just before we start, how many Terabytes of data does an e-reader hold, and how many thousands of concurrent transactions does it have to process per second? Are we talking about hundreds of table joins, where each table could contain millions of records?

...

Of course, the following are all unsubstantiated claims - nobody except Amazon knows what exactly is happening in their code after all. I'm just sticking with how Amazon organized their database schema - we're talking about 0 joins (as opposed to 2), followed by JSON parsing and (in most of the cases) at least one additional DB query to "simulate" the join - because the data has to be retrieved anyway, and I actually doubt that they do all of the retrieval in one go; I rather think that it will be n additional queries.

Essentially, I strongly suppose that they "reimplemented" the join in client code. So also in terms of performance, I'd bet that a proper database schema would outperform the current one - even for the extremely small database that a Kindle holds. Plus, it could also provide referential integrity at no cost.

Another unsubstantiated claim: if the Kobo really has that many problems with its database, then I assume that it's for the exact same reason. A database system can do so much more than just "put things in there and get them out of there", but one actually has to understand at least a little bit of the theory to use it efficiently. Seriously, the DB schema that Amazon uses is pretty much what any 16-year old would intuitively come up with. They're essentially not using ANY of the benefits that a DBMS really offers - they could have just used plain files instead*.

(*) Yes, the collections actually were plain files before the K5. So they just seem to have ported that "to a database". But why build an entire new API around it, if they don't use the benefits of the DB? Just because "management asked that the collections must be stored in databases now"? What's the point?