View Single Post
Old 10-26-2010, 04:29 PM   #5
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
@Giuseppe Chillemi
Can't you search for unknown?

Quote:
File Hash duplicate check -> Ask for deletion.
Then:
Start the current Tag based duplicate check.

Berlieve me when I say your library will never have duplicates and believe me when I say this solves the Calibre Crash problem that generates duplicates. If the TAG library crashes and you have 100 "unknown - Unknown" books the tag mechanism won't work while the hash mechanism will find 100% of them, either checking against existing books or if you restart the batch, against new added books.
How can this work? If you modify a book (even by importing to calibre) by adding metadata, changing title etc. the hash will change. The hash would have to be calculated before importing which would possibly slow things down on an overall level.

Even a hash based on title only will be useless as many titles seem to be chosen by different authors. Title-author, you have to deal with things like spaces after initials or periods after initials.

For example J. D. Robb could be written in many ways
J. D. Robb
J.D. Robb
J D Robb
Norah Roberts as J. D. Robb.

Only hash code that would actually work is one generated on original file and this could not be done until file was added.


While pretty old school, hash codes still have their place, but I don't think it is a viable solution in this instance. The current duplicate detection is pretty good although not foolproof.

Of course you can prove me wrong by writing a foolproof hash code based calibre module to do this

Regards
Helen
speakingtohe is offline   Reply With Quote