View Single Post
Old 02-11-2011, 03:05 AM   #82
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,463
Karma: 10684861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Quote:
Originally Posted by chaley View Post
Regarding transitivity, consider the following. Assume:
- a test that matches if two books contain one title word in common and 1 author in common.
- a book 'Ectoplasm' by Joe Blogs (book 1)
- a book 'Auras' by Patricia Posts (book 2)
- A book 'Ectoplasm and Auras' by Joe Blogs and Patricia Posts (book 3). This is an omnibus edition.

The test will identify books (1,3) and (2,3) as potential dupes. Transitivity would give us (1,2,3), which is clearly wrong, as 1 and 2 are definitely not dupes of each other. I am ignoring further levels transitivity, which would expand the set even more.

The question then becomes which is better, showing all three which might help identifying the omnibus but requiring some thought to ignore the (1,2) pair, or showing (1,2) (1,3) which shows the information the test actually found (and avoids the transitive closure problem). I don't have an answer. My guess is that this will come to personal preference. Joy to the GUI man.
I am proponent of the theory that we should start with something that is not perfect but works and is relatively easy to implement then we can use it and discuss how to improve the result. This is how Calibre is developed ;-)

So, at the moment I would be extremely happy if I got result (1,2,3). I would have to go through results anyway and this would be *much* quicker than going through entire collection author after author (checking for the fuzzines in the author name (that is King Stephen; Stephen King; S. King; King, S.; S KING ...))
kacir is offline   Reply With Quote