Quote:
Originally Posted by chaley
This plugin is fun to play with. 
|
Yes it does have an element of playing the pokies about it - you pick a combination and pull the handle to see what turns up
The soundex is one for sure that may need some refining. I had to tweak the algorithm off that link - it blew up on titles with non ascii characters in the names so now I ignore those. There is also the question of what length to make the soundex - too short and your buckets are too big, too long and it might not be fuzzy enough. As a starting point I chose a length of 6 for titles and 8 for authors but these were relatively arbitrary based on some random sampling.
You could potentially expose this on the duplicate options dialog I guess if you wanted to allow users to tune to their liking? I guess it depends on how much control we want to offer if any.
When soundex is applied to authors, I try to apply to the surname first and then the rest. So if you had "Robert Cross" and "Robert Ludlum" they shouldn't appear together from a soundex match, but "Nora Roberts" and "N. Roberts" would.