View Single Post
Old 09-10-2010, 04:25 PM   #4
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,741
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Problem 1: whatever checks the hash must know when to regenerate it. Calibre doesn't know when I edit an epub, when a viewer drops bookmarks in, or when other operations take place legitimately change the file. The user might know, though.

Problem 2: telling me that a file is already corrupt is too late. I want the file repaired. Knowing that isn't going to happen, I keep one set of backups on a RAID disk, and another set on DVD. You will now note that I need to know to go get the backup. That takes us to ...

Mitigation 1: epub (at least) is in fact zip, which is internally protected by checksums. I think that mobi is as well. Such filetypes are easily scanned using existing tools.

Mitigation 2: you can do this today using external tools and calibre's command line. For example, make a custom column called sha1. Use whatever tool you wish to compute the SHA1s of all the files for a book, saving the output as a long string. Use calibredb set_custom to write that string into the database. Use calibredb list to extract that string and compare the hashes. For example, on linix you could use sha1sum to generate a set of hashes, and sha1sum --check to verify those hashes. Altermatively, simpler, and not requiring a custom column, periodically run checksum compares against a stored checksum list. From time to time generate the list (such as when things change). At whatever frequency you want, check the sums.

Comment 1: I am not convinced that I want calibre to be involved in archival issues like this. First, archive verification is a personal thing, touching backup schemes and personal preferences. Second, calibre changes very quickly, and compatibility difficulties will certainly arise. Third, development and maintenance would be taxing for a small team of volunteers.

Comment 2: It should be possible for an interested party to build some tools that run along side calibre. The techniques mentioned above could be used, or perhaps others.
chaley is offline   Reply With Quote