Quote:
Originally Posted by chaley
Regarding transitivity, consider the following. Assume:
- a test that matches if two books contain one title word in common and 1 author in common.
- a book 'Ectoplasm' by Joe Blogs (book 1)
- a book 'Auras' by Patricia Posts (book 2)
- A book 'Ectoplasm and Auras' by Joe Blogs and Patricia Posts (book 3). This is an omnibus edition.
The test will identify books (1,3) and (2,3) as potential dupes. Transitivity would give us (1,2,3), which is clearly wrong, as 1 and 2 are definitely not dupes of each other. I am ignoring further levels transitivity, which would expand the set even more.
The question then becomes which is better, showing all three which might help identifying the omnibus but requiring some thought to ignore the (1,2) pair, or showing (1,2) (1,3) which shows the information the test actually found (and avoids the transitive closure problem). I don't have an answer. My guess is that this will come to personal preference. Joy to the GUI man. 
|
I am proponent of the theory that we should start with something that is not perfect but works and is relatively easy to implement then we can use it and discuss how to improve the result. This is how Calibre is developed ;-)
So, at the moment I would be extremely happy if I got result (1,2,3). I would have to go through results anyway and this would be *much* quicker than going through entire collection author after author (checking for the fuzzines in the author name (that is King Stephen; Stephen King; S. King; King, S.; S KING ...))