Quote:
Originally Posted by Starson17
Not if you think of this as "Show me all books that may be duplicates of Book 1." I don't have to think about anything except possible matches to Book 1. There's a possible duplicate set (1, 2) and another (2, 3), so If I work through the books in book order, and I'm working on Book 1 matches, I only have to decide if Book 1 is a match of 3 and not if Book 2 matches 3.
|
Finally I understand you, I think.
To test my understanding, does the following make sense? Assume that I have done a set-oriented test, and now have a bunch of sets. If I was viewing by book, then when I ask 'show me matches for book X', I would show at one go all the sets containing X. This is a set union, not a transitive closure. In the example above, asking for matches of book 1, I would see book 3. Asking for matches of book 2, I would see book 3. Asking for matches of book 3, I would see books 1 and 2.
This is probably a very useful alternate visualization of the data. I don't think it would be hard. All we would need would be a book -> set map.
I also think that duplicate processing would happen when building the sets, but would not happen when merging the sets (doing the union). I think that this would give the answers very close to what you describe in your second post (the more detailed example).
Quote:
Also, we've barely discussed what to do with multiple matching functions, which I suspect will need to be handled. If one matching function is author/title based and I mark (2, 3) as "Not Duplicates", then later use a "Find all identical ISBN numbers" as a new matching function, should a (2, 3) match be ignored, even if they have identical ISBN numbers?
|
Who knows?
My guess is that duplicate processing must be optional. Fortunately, this isn't hard. Just don't do the post-pass.