View Single Post
Old 02-10-2011, 10:01 AM   #79
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,476
Karma: 8025702
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Starson17 View Post
It's worth considering how a duplicate finder is likely to be used. Will it be used only to find and permanently merge or eliminate duplicates? Or will it also be used as some sort of pseudo search extension.

If the search functions for duplicates include soundex functionality (similar sounding names - fuzzy matching) that isn't implemented in the search bar, we may want to be able to disable the false positive removal, or implement the duplicate finding functions in the search bar.
This is a good idea, and not disallowed by the schemes being discussed. A search would produce a set. I don't see any necessity to do known-duplicate processing.

I should point out that as it is, search is not capable of comparing a given book against all books in the library. Some serious work would be required to be able to ask the question "find all books that are like this one". If the fuzzy searches are invertable (can be determined from book data), then I can see generating a fuzzy-search expression that produces a list of matches. However, if the fuzzy searches are one way, where some algorithm is applied and some number of books 'win', then things are much more interesting.
Quote:
I know that at some point I'm going to find a group of near duplicates that I don't want to merge and do want to eliminate from further duplicate searches, but which I later want to find as a group simply because I remember I found that group once before and I want to see it again.
It seems that you are saying that you want the option to not do known-duplicate processing. That should be easy enough for the GUI-man.
chaley is offline   Reply With Quote