Quote:
Originally Posted by chaley
- if I run a test that finds one group, then mark that group as exempt, I get the message "No further duplicate groups exist for 'None'". If I subsequently run the test, I get "No duplicate groups were found using 'similar title, similar author'". Perhaps the 'None' was supposed to be 'similar title, similar author'?
|
Oops, I'll look into that. I confess to not testing "resolving the last duplicate" because I got lazy and tired of continually recreating duplicate scenarios to test
Quote:
- I was unable to make anything break by pushing the clear button or by clearing the restriction. However, using the tag browser to do searches has the side effect of leaving duplicate_check mode when cycling through searches, because one of the states clears the search. I don't know if this is a problem, and if it is, I don't know how to fix it.
|
Darn it, I knew I would miss a permutation of that clear event and it would come back to bite me. Yes it is a problem.
It is all caused by hooking into the wrong signal. What I am really interested in is the user clicking the clear button action on the toolbar, not in the search being cleared. I have added all sorts of filth to the code to try to disconnect/connect around doing actions which result in the search being cleared, but that doesn't work when as you say actions like tag browser clicking result in another scenario I can't differentiate between.
I would like to rip all my filth out and instead directly hook into the triggered signal of the clear search button action. You have any objections/thoughts on that? I should have pulled the pin on my current hacks and proposed this days ago, but I was playing whack-a-mole with the event triggering instead of a fresh perspective.
Quote:
I am still not convinced that we need author exemptions, much less to use them in book searches...
|
The problem with if we only run with the algorithms in the plugin currently is that it does not help the user find books by the same author with a simple variation in initials/first name.
So to make this plugin more complete/useful imho we *need* an ignore title based search.
But the problem with trying to treat such searches as "book searches" is that our normal exemption model and grouping model does not fit. As I think we are all agreed on you will want to see all the books by those authors who have been found to be similar, to then be able to review what are genuine data entry/import errors versus author names that for whatever reason you decide are valid to be treated as not duplicates of each other.
It also sounds like we are in agreement that trying to apply such author based exemptions to book searches is a bad idea. So that takes one aspect of the complexity out.
Quote:
Finally, and probably a red herring, there are situations where S Smith and Steve Smith are in fact the same author, but listed differently on purpose. This happens all the time in academic papers, where the author name varies slightly from paper to paper. Do I need another kind of exemption to handle these?
|
Not an issue in my opinion. If you flag those two authors as exemptions you are saying to the plugin that you do not want those authors to be displayed again as duplicates of each other. That your reason is that they are different people or different variations you want to preserve is not relevant imho. The intention is that when you next run the author based search you are not faced with spending brain cycles on making that same choice again.
Quote:
I do recognize that other people might want to work differently. There is nothing that forces me to use author exemptions. My argument against them is based mostly on complexity, especially as this code will be integrated into trunk, where it might be touched (maintained) by more than one person as calibre evolves.
|
Totally agree that maintenance is an issue to be potentially concerned with. Until I work through all the details I won't know how much of an impact this has. Obviously there is a lot of commonality, but there are significant differences as well.
I've only started last night thinking through all the implications and how it would fit. For instance when you are reviewing groups of authors, you are not going to want the "show all duplicates/highlight mode" option - instead it will be one group at a time and then the tag browser to filter within that group as you like or rename authors etc. So the Find duplicates dialog either needs a different dialog/menu option, or rearranging so that the options of how to view the results is either disabled or made a suboption of book based searches.
But I need to finish reviewing what is involved before I know for sure the impact. There is already a house of cards that has started to have been built by the permutations of individual versus group review and in particular adding duplicate exemptions. I have no interest in making a rod for my own back or anyone else's by making this more complex than it is currently. However I am convinced we do need ignore title searches, and if I have to rewrite the way I have done the code so far to support them then better to do that now and get it sorted while it is fresh in my mind than down the track imho.