View Single Post
Old 04-14-2010, 09:28 AM   #15
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Worldwalker View Post
I like the idea of a duplicate finder. Very much so. Hopefully it would fix a problem which I have, where I've got several versions of the same book (for example, from the Baen Free Library and the Baen CDs) and they're showing up as separate entries. They are sometimes, though not always, different file types. I'd like to merge them, or at the very least get rid of the spares without having to hunt them down individually.
Duplicates was one of the driving factors for me to write merge record code, which I submitted while Kovid was on vacation, but which I've been using for a month+ now.

Mostly, I had situations where: 1) I'd have two slightly dissimilar titles - just different enough that my fuzzy title matching code for Add Books didn't match, or 2) for books that I'd added to Calibre before I wrote that code, or 3) I'd have two slightly different author names, one with a middle initial and one without.

The Add books code (which has limited duplicate detection) is preventing me from getting too many duplicates, so I didn't feel very strongly about the need to improve the duplicate detection or provide a means of searching for duplicates. Even the best duplicate search algorithm is going to make mistakes. Instead, I wanted a quick and easy way to merge two records I consider to be duplicates, whenever I spotted them. That was the purpose of the merge record code.

I will admit that I ran an SQL search on the database to find all books with identical titles. I then went through that list and used my merge code to merge records where appropriate, so I can see value in better duplicate detection. However, as Kovid says, it's something I'd only use during initial import of a large collection.

After initial import I've handled dupes as I import smaller groups pf books with a combo of the Add books code (that puts duplicate books of a different format into existing records) and merge code (which lets me collapse two books with slightly different titles or authors into a single record).
Starson17 is offline   Reply With Quote