MobileRead Forums - View Single Post

sethcohn · 01-01-2013, 04:10 PM

Similar Stories won't do what I'm looking for at all. It's completely different.
It looks at a single item, creates an index, and looks for matching items, ranking every single item via a new column. That's completely unneeded, and counter productive.

Duplicate Finder creates a hash table, the right way of doing this. The difference is creating a hash of the entire file, and creating a hash of only the items that matter and ignoring those that don't (the metadata.opf file, for example, guaranteed to be different in small ways if altered at all)

If you aren't interested, fine. Kovid suggested your plugin, and I gave you (and him) the courtesy of asking here, despite that you clearly aren't interested. My original post (linked above) points to tools outside of Calibre that are useful for this sort of comparison, and frankly, I see the value of doing it from within Calibre (especially for importing large quantity of fresh books into an already large library, perhaps from completely public domain sources...) even if you still don't. I hope someone else out there finds the pointer useful, and I hope someone creates a plugin that will do this sort of Dupe checking (content based hashes, not file based hashes) one of these days.

01-01-2013, 04:10 PM	#361
sethcohn Junior Member Posts: 6 Karma: 10 Join Date: Jun 2005	Similar Stories won't do what I'm looking for at all. It's completely different. It looks at a single item, creates an index, and looks for matching items, ranking every single item via a new column. That's completely unneeded, and counter productive. Duplicate Finder creates a hash table, the right way of doing this. The difference is creating a hash of the entire file, and creating a hash of only the items that matter and ignoring those that don't (the metadata.opf file, for example, guaranteed to be different in small ways if altered at all) If you aren't interested, fine. Kovid suggested your plugin, and I gave you (and him) the courtesy of asking here, despite that you clearly aren't interested. My original post (linked above) points to tools outside of Calibre that are useful for this sort of comparison, and frankly, I see the value of doing it from within Calibre (especially for importing large quantity of fresh books into an already large library, perhaps from completely public domain sources...) even if you still don't. I hope someone else out there finds the pointer useful, and I hope someone creates a plugin that will do this sort of Dupe checking (content based hashes, not file based hashes) one of these days. Last edited by sethcohn; 01-01-2013 at 04:11 PM. Reason: grammar correction