View Single Post
Old 04-16-2011, 07:33 AM   #77
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by kiwidude View Post
I'm not intending to change it at this point, your suggestion of getting wider feedback is valid. It is just that I wanted to add a fuzzier author & title algorithms to make this plugin more useful. However adding just a single "fuzzy title, fuzzy author" option might bring back way too many false positives. Maybe "similar title, fuzzy author" and "fuzzy title, similar author" would be the most useful variants of that.
You might be able to get around this by adding some ability to find fuzzily similar authors so that users can fix them to be all the same author. This way a user can fix up all their author records first, then do fuzzy/fuzzier title with exact author as a second pass.

Common criteria are spaces existing/not existing between initials. Initials being dropped or listed fully full names, author sort and authors reversed. It seems like it would be best suited to this plugin, as you could use all the same logic you're using for duped books and just make larger groups by author.

I know for myself I actually get more annoyed by my authors being messed up than by duplicate books. I keep dupes around all the time and just flag the poorer versions with a tag rather than merging/deleting them (crap books are good test candidates for heuristics), but it annoys me to no end trying to find all the messed up variants of the same author, which is basically a different variant of the duped books problem.
ldolse is offline   Reply With Quote