Thread: What a regex is
View Single Post
Old 05-06-2010, 07:50 AM   #14
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,747
Karma: 6997673
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Disfrutalavida View Post
Anyone old enough to remember DOS batch files?
Yep. And also 1620 arithmetic tables.
Quote:
Kovid, it looks like you are saving all the input files in the structure you decided on but referencing them with the incrementing (xxx) as part of the {title} folder name?
On my machine, the calibre library is saved as authors/title (number)/formats etc. Is the xxx you are referring to what I called 'number'? If so, than that number is the primary key of the book record, and can go as high as 2**64.
Quote:
It also seems to continue to increment from where it left off even when records are deleted. Do you plan on reusing the (xxx) pointers?
It is an auto-increment field. One of their properties is that they are never reused.
Quote:
On the bright side since you are using the (xxx) as the location pointer it should be easy to implement different output folder structures and add without saving so we can all have our cake and eat it too.
Unfortunately, it isn't easy without an almost complete rewrite of the library storage code. This isn't to say that the idea isn't good, only that it would be hard to do within the current structure.

Calibre assumes that the library is a black box, never to be looked at. Books are stored as pseudo-BLOBs, but instead of using a real BLOB the book is stored in the filesystem using a computed directory path. This is done (I think) for safety and performance, not to make the files available for processing by external tools.

If one thinks of the books as BLOBs, then copying the data in and out becomes the natural thing to do. The path becomes a form of table name, and the book formats (and other things) are columns within that table.
Quote:
This would allow building the metadata.db without saving the files to a new location.
But would also make data integrity almost impossible to maintain.
Quote:
More flexibility and the ability to quickly and easily update the metadata from the file data if the metadata, as is the case in many of my files, is incorrect.
Not sure why having book formats stored outside the library gives these advantages.
Quote:
Is the Author's name only one field? Bet it is.
How difficult would it be to make it 2 fields so the output folders could be changed?
Currently {Author}, etc gives First Last.
I use Last, First for author names. Doing so isn't hard to do, as long as you start early when building up the library to avoid conversion hassles.

It is worth noting that although author is a single field (as you guessed), the collection of authors is normalized and not stored directly in the book record. Author_sort is a denormalized form of author, stored in the book record, so that corrections can be made in sort order. Without going into my normal rant about this, consider the difference in storing and sorting Chinese names vs western-style names (or Japanese, for that matter). The complexities come close to forcing the denormalization.
Quote:
Sorry for the long post. I just had to get it all out.
Discussion is good.

Last edited by chaley; 05-06-2010 at 08:07 AM.
chaley is offline   Reply With Quote