I guess I wasn't really clear..., no worries.
What I meant was I agree the algorithm would be ignore title, fuzzy author.
But in the case of 'books', we're actually looking for duplicates, so any set of duplicates in a particular groups is something you would mark and put in the results a user needs to sort through.
For merging authors if all the author variations are identical, e.g.:
1. The Lord of the Rings - J. R. R. Tolkien
2. The Two Towers - J. R. R. Tolkien
3. The Return of the King - J. R. R. Tolkien
4. The Hobbit - J. R. R. Tolkien
All these authors are exactly identical. So the algorithm will say they're duplicates, correct? In this case I'm not interested sorting through this match since they're all 'correct'. So they shouldn't be marked to show up in the results that user needs to go through.
However if your library has one variant that's not quite right:
1. The Lord of the Rings - J. R. R. Tolkien
2. The Two Towers - J. R. R. Tolkien
3. The Return of the King - J. R. R. Tolkien
4. The Hobbit - J. R. R. Tolkien
5. The Fellowship of the Ring - J.R.R. Tolkien
Then I have a group that should be marked, and since they're not exactly identical I actually want them to be displayed.
So what I'm saying is with Author searches perfect Dupes can and should be ignored, which is different from dupes focusing on books themselves. Basically the intended outcome is inverted - with authors I'm trying to make more dupes, with books I'm trying to make less.
Did I make any more sense, or did I miss a point earlier which makes this moot?
Agree with Chaley, about the resolution with edit metadata, that's what I was thinking as well.
Last edited by ldolse; 04-16-2011 at 09:34 AM.
|