View Single Post
Old 09-10-2010, 07:56 PM   #8
BookGnome
Voracious Reader
BookGnome is on a distinguished road
 
BookGnome's Avatar
 
Posts: 4
Karma: 62
Join Date: Sep 2010
Device: Kindle
Quote:
Originally Posted by kovidgoyal View Post
use, though better, IMO, would be to add support for declaring a particular format as the "Master format". calibre would then ask for confirmation before running a conversion that would overwrite it.
I really like this idea, because right now, Calibre treats all versions as equal even if they were gathered from different sources. Having a canonical version for a given book might mean a change in user work-flow (e.g. if you buy both an EPUB and a Mobipocket version of a book, you should store them as separate books).

Quote:
Originally Posted by kovidgoyal View Post
As far as combating bitrot is concerned, I don't really see how the hash would help. After all, say calibre tells you that the file has changed (this would only happen if you ask calibre to check, for example during a db integrity check). Then what? The file has already "rotted" not much calibre can do about it. I suppose you could then go into your backups to try to find a pre-rotted version.
Well, even if the only thing is did was notify you that things were hosed, that's better than not knowing. Then you could retrieve from backups, download again, or buy another copy--whatever was needed.

You'd also get something else for free with the hash: the ability to quickly identify exact duplicates in the database. You wouldn't have to rely on book metadata such as author, file size, or ISBN...if two copies have the same hash, then they're the same book. That doesn't mean other types of duplicate checking aren't useful, but finding two books with the same title by Piers Anthony wouldn't tell you whether both copies were byte-for-byte duplicates. Knowing that might make it easier for the to decide whether a book should be rejected as already in the database, or added anyway as an alternative version--without the hash, there's not really a great way to tell.

As for the repair issue, that's a secondary issue, but one that I think is solvable by optionally storing a small amount of recovery data (say, 10%) in the directory with each canonical copy of a book. You wouldn't need it for books you can regenerate from your known-good master copy, so the file system usage wouldn't actually grow by 10% overall--just a small percentage for each master format.

Whether Calibre shells out to par2create/par2repair, or somehow integrates some Reed-Solomon library directly (perhaps this Python library?) I think Calibre could easily add the ability to recover from file system corruption.

I definitely agree that identifying corruption, and preventing silent corruption (the worst kind, IMHO), is more important than fixing it. Knowing something is wrong always has to be the first step.
BookGnome is offline   Reply With Quote