Could you preserve the file dates (timestamps) when you unzip/rezip the epub?
Two reasons:
One is completely OCD, but if all I'm doing is removing unwanted files somebody else (iTunes/calibre) added to my retail epub, I'd rather keep the html original, including timestamp.
The other is it makes for an interesting anomaly -- if I make a copy of an epub, (which then matches in your binary compare), strip the bookmarks file out of both copies (at slightly different times), the copies no longer compare:
even though all the content (and hashes) are identical the epub itself is different (and hashes different) due to the timestamps differing.
-- This brings up another thought -- would it be practical to add a "fuzzy" option to your binary compare, something like you open the epub and check the hash of the largest folder for a match?
(and should I bring this question up in the Duplicate Check thread?)
Last edited by capnm; 05-30-2011 at 10:49 AM.
Reason: afterthought
|