View Single Post
Old 02-11-2011, 09:50 AM   #87
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Starson17 View Post
Not if you think of this as "Show me all books that may be duplicates of Book 1." I don't have to think about anything except possible matches to Book 1. There's a possible duplicate set (1, 2) and another (2, 3), so If I work through the books in book order, and I'm working on Book 1 matches, I only have to decide if Book 1 is a match of 3 and not if Book 2 matches 3.
Finally I understand you, I think.

To test my understanding, does the following make sense? Assume that I have done a set-oriented test, and now have a bunch of sets. If I was viewing by book, then when I ask 'show me matches for book X', I would show at one go all the sets containing X. This is a set union, not a transitive closure. In the example above, asking for matches of book 1, I would see book 3. Asking for matches of book 2, I would see book 3. Asking for matches of book 3, I would see books 1 and 2.

This is probably a very useful alternate visualization of the data. I don't think it would be hard. All we would need would be a book -> set map.

I also think that duplicate processing would happen when building the sets, but would not happen when merging the sets (doing the union). I think that this would give the answers very close to what you describe in your second post (the more detailed example).
Quote:
Also, we've barely discussed what to do with multiple matching functions, which I suspect will need to be handled. If one matching function is author/title based and I mark (2, 3) as "Not Duplicates", then later use a "Find all identical ISBN numbers" as a new matching function, should a (2, 3) match be ignored, even if they have identical ISBN numbers?
Who knows?

My guess is that duplicate processing must be optional. Fortunately, this isn't hard. Just don't do the post-pass.
chaley is offline   Reply With Quote