View Single Post
Old 07-31-2011, 08:40 AM   #115
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@rigolo - welcome to MobileRead.

In answer to your question - none of the Find Duplicate comparisons ever look inside the format (e.g. inside the EPUB). Nor do they directly look at the opf files sitting in the directory. For all but the binary comparison they use the data stored inside the metadata.db database that Calibre uses to manage your library - in theory this should match what those metadata.opf files contain within each book's folder but as I said above they are not directly compared.

The binary comparison is exactly that - comparing effectively byte for byte that two files match.

Trying to compare the internal contents of a book format using this plugin is not possible, and I have no desire to extend it to do so. It was discussed a little IIRC on the duplicates thread in the development forums. For a start it would be intolerably slow. Secondly it wouldn't work with all formats (you have mentioned EPUB only - this plugin looks for duplicates across all formats). And thirdly, where do you draw the line - what about a slightly different cover image, a tweak to the stylesheet, etc etc.

All this plugin can do is put you in the ballpark of telling you that two formats appear to be duplicates based on their title, authors etc that you have associated with them in Calibre. Whether in fact you decide their text contents are "near identical" as part of your resolution process to decide which to keep is a whole different kettle of fish, and not something I see it ever attempting to address. As I have mentioned several times before I see it as potentially something that an enhanced "SmartMerge" plugin could attempt to do. However I personally don't have a need for it any more (I have changed how I add my books to my library to negate the likelihood of duplicates in the first place) so I leave it to someone else to develop such a plugin...
kiwidude is online now   Reply With Quote