MobileRead Forums - View Single Post

kiwidude · 04-11-2011, 04:08 AM

A couple more thoughts on the false positive stuff:

Why is it that it was thought necessary to include the algorithm when storing a pair as a false positive match? Surely if I say that books 1 and 2 are not duplicates, then that should be it as far as that pair are concerned. If they are found for a Fuzzy Title/Exact Author search, then the same pair will come up again when I do a Fuzzy Title/Fuzzy Author search. So I would have to exclude them again, and again with every new algorithm which sounds not the point of indicating a false positive in the first place. The only case that I can think of which perhaps Starson17 mentioned is if you added an "ISBN match" algorithm, and it found that 1 and 2 had the same ISBN. This may indicate a problem with your metadata, but as this plugin is about resolving duplicates and you have already said Foo and Bar are not duplicates then still they should not appear imho. I actually think "ISBN match" should be a feature in the "Quality Check" plugin, that would help you identify to fix a 1/2 ISBN match after you had indiciated they should not appear as duplicates from false positives.
If we allow book based review, you could get into a mess with false positives. So if (1,2) and (1,3) are found, your review book 1 which displays (1,2,3). The user out of confusion could select (2,3) and say "Mark as not duplicates". However of course (2,3) was never considered as a duplicate in the first place, and if they run the search again without resolving (1,2) or (1,3) then 2 & 3 are still going to be displayed on screen in a book based review. Perhaps the "Mark as not duplicates" menu item would need some validation over being enabled to make sure that the selection only covered a set the plugin identified as a duplicate. If the root (1 in this case) was visually highlighted then the user would have a better visual queue that they should only mark a pair as not duplicates if that visually highlighted root item is one of them. Again if doing set based review this issue is not relevant.

04-11-2011, 04:08 AM	#2
kiwidude Calibre Plugins Developer Posts: 4,735 Karma: 2197770 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	A couple more thoughts on the false positive stuff: Why is it that it was thought necessary to include the algorithm when storing a pair as a false positive match? Surely if I say that books 1 and 2 are not duplicates, then that should be it as far as that pair are concerned. If they are found for a Fuzzy Title/Exact Author search, then the same pair will come up again when I do a Fuzzy Title/Fuzzy Author search. So I would have to exclude them again, and again with every new algorithm which sounds not the point of indicating a false positive in the first place. The only case that I can think of which perhaps Starson17 mentioned is if you added an "ISBN match" algorithm, and it found that 1 and 2 had the same ISBN. This may indicate a problem with your metadata, but as this plugin is about resolving duplicates and you have already said Foo and Bar are not duplicates then still they should not appear imho. I actually think "ISBN match" should be a feature in the "Quality Check" plugin, that would help you identify to fix a 1/2 ISBN match after you had indiciated they should not appear as duplicates from false positives. If we allow book based review, you could get into a mess with false positives. So if (1,2) and (1,3) are found, your review book 1 which displays (1,2,3). The user out of confusion could select (2,3) and say "Mark as not duplicates". However of course (2,3) was never considered as a duplicate in the first place, and if they run the search again without resolving (1,2) or (1,3) then 2 & 3 are still going to be displayed on screen in a book based review. Perhaps the "Mark as not duplicates" menu item would need some validation over being enabled to make sure that the selection only covered a set the plugin identified as a duplicate. If the root (1 in this case) was visually highlighted then the user would have a better visual queue that they should only mark a pair as not duplicates if that visually highlighted root item is one of them. Again if doing set based review this issue is not relevant.