Ok, here is the progress so far. The big ticket missing at this point is handling false positives. However the rest of it is pretty much there. So far I've put in five different algorithms:
- ISBN exact match
- Ignore author, exact title (Calibre add default logic)
- Ignore author, similar title
- Exact author, similar title (Calibre automerge logic)
- Similar author, similar title (Author using Kovid's new metadata logic)
It returns the matches, and you can navigate through them either by clicking on the toolbar button (ctrl+click on the button for previous) or using keyboard shortcuts (ctrl+\ for next result, ctrl+shift+\ for previous). Groups are dynamically removed every time you move to the next result.
False positives are my next issue. The first part which perhaps Chaley has thoughts on is deciding how to store them. For instance the simple case is that a user marks (1,2) as not dups, then (1,3) as not dups. So I could store those as tuples in a list for the library like [(1,2),(1,3)]
However in theory we could allow the user to select more than 2 rows. For instance they select (1,2,3). Now what does that actually "mean". Presumably the net effect is indicating [(1,2),(1,3),(2,3)]. Do we break it down and store it that way, or do we store it as [(1,2,3)]?
Now what happens if the user then selects (1,2,3,4)? As obviously that is a superset. Or what if they select (1,2,5)? I haven't got my head around whether we should flatten this out yet etc.
Presumably we also need to cater for the user screwing up and allowing them to "undo" a no duplicate set. And how could we show the user visually what duplicate exemptions they have in place? That might need a special screen being built for it I think (and let them delete exemptions from that screen).
Feedback as always appreciated, be it on the code, usability or whatever. So far it is fairly simple I think.