Thread: What a regex is
View Single Post
Old 05-10-2010, 04:30 AM   #20
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,499
Karma: 8065348
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Disfrutalavida View Post
I dislike "black boxes" intensely. I always want to know how things work.
I agree with you here! I try very hard to look in the boxes. Haven't had good luck with changing their contents, though.

Quote:
I don't see how referencing the files by the (xxx) number in the folder would cause data integrity problems, but without knowing more about how the tables are setup I will defer to you.
Having spent a fair amount of time inside calibre recently, I think I can infer some basic design decisions. I am not 100% certain, as I wasn't around when Kovid did the design work, but here goes.

Basically, calibre does four things: stores data about publications in a way similar to bibliographic managers like Zotero or Endnote, converts publications between formats, stores those formats for use in various ways, and interfaces with reading devices. We can argue about whether it should do these things, but that is beside the point.

Because of 'things' 2-3, and to some extent 4, calibre wants to have guaranteed access to formats either supplied by the user or that it creates. I don't want to say 'books', because even though a book might exist in several forms, it is still the same book from the metadata standpoint. In order to guarantee access, it insists that the path to a format be 1) computable using book metadata, and 2) unique. BLOBs in a database have these properties, but are not ideal for reasons ranging from incremental backup capability to single errors causing complete loss of data.

There are many schemes that can conform to these two criteria, and yours is one of them. I can imagine a folder containing thousands of files named nnn.format, but that raises the problem of folder size. Kovid probably wanted a scheme that partitions the file space into folders that weren't so large as to overflow some OS's folder capacity or run so slowly that it is unusable. He probably also wanted a structure that a human could make sense of, so that backup/recovery is easier.

One scheme that does not conform is storing references to files contained in random folders. Neither constraint is met: the path is not computable and there is no guarantee of uniqueness. In addition, because the folders are 'owned' by something other than calibre, the problem of data integrity arises. The files can easily be moved, altered, or even deleted.

Clearly Kovid could have made different choices. For example, he could have implemented libraries in the same fashion as MediaMonkey, which supports a DB + path-to-file scheme. However, different choices lead to different problems. For example, I use MediaMonkey, and several times been required to rebuild the library because of mistakes and naming scheme clashes. MM supports a calibre-like scheme where MM renames and copies my music into a 'library', which I have used for some years now because it makes library maintenance so much easier. But all that aside, Kovid made the choices he did, and either we live with them, go somewhere else, or get into the code and change them.

Quote:
I didn't think about Asian languages, since I never had to deal with them in my work which was all with US companies and government.
Funny since I had a Chinese fiancee' who was Lin Fang in Chinese, but Fang Lin in English.
I ran into this problem while living in Malaysia. We had to deal with Chinese, Indian, and Arabic naming schemes, and no system worked. For example, I have a friend named nik xx binte yy, which means approximately Lady xx daughter of yy. She insists that she does not have a last name. Another friend is of Palestinian origin, and his name mixes systems (his-fn son of fathers-fn ln); his problem is that his father's name is continuously assumed to be his middle name (something he does not have). I had Chinese students who would use their Chinese name in the traditional way (LN FN) but also had western names written FN LN that usually weren't related to Chinese at all. Add to all of this European names like Werner von Braun, where 'von' is both part of and not part of the last name (last name is written von Braun but should be sorted under B), and I pull (what is left of) my hair out.

Quote:
Glad to see someone else as old as I around here.


Charles
chaley is offline   Reply With Quote