![]() |
#151 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Code:
books known not to be duplicates [(3, 4), (3, 5)] candidate duplicates [1, 2, 3, 4, 5, 6] After partioning [[1, 2, 3, 6], [1, 2, 4, 5, 6]] Spoiler:
|
|
![]() |
![]() |
![]() |
#152 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Hey Charles,
Yeah I eventually figured out the map loading in the test code was suspect after I put my own algorithm in and got unexpected results, haha. Your new version has the same approach I took but with more concise code in that set magic in the middle so I shall steal it verbatim thanks! |
![]() |
![]() |
Advert | |
|
![]() |
#153 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
You have a preference on the gui?
Last edited by kiwidude; 04-25-2011 at 05:41 AM. |
![]() |
![]() |
![]() |
#154 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
I assume this is aimed at me?
Yes. C) None of the above. ![]() I prefer screenshot 2, but changed to something like the following: I didn't bother to write code to center the radio buttons in the grid. This example shows 2 things. The first is the use of common column labels, making explicit the coupling implicit in columns. The second is introduction of soundex for author names, which seems reasonable, given that soundex was invented for names. I prefer the column layout even if you don't want to add soundex for authors. In this case I would have two empty holes in the grid. What I am trying to achieve is common headers. An alternate that might work better for me is shown below. This one avoids the notion of columns and rows, instead using groups. I think this layout reduces the semantic coupling between the two sets of options caused by the row/column layout. However, both of your proposals work. I won't be unhappy if you pick either one. ![]() Last edited by chaley; 04-25-2011 at 07:46 AM. Reason: Make last sentence clearer. |
![]() |
![]() |
![]() |
#155 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
Kiwidude of the two you presented I do like the one above the best. |
|
![]() |
![]() |
Advert | |
|
![]() |
#156 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Hee, hee.
Ok, attached is your first variation. I do agree with you that the second variation (which is what you suggested ages ago) is probably the better bet though. The only thing I had against it was ending up with a more "vertical" dialog. However thinking about it now it really wouldn't be that much taller. We shall see what I end up with when I push 0.6... haha. The plumbing is all done supporting all the various permutations etc, I just need to start tuning the implementations of the algorithms. @dwanthny - thx and I agree that of the two I originally presented the one you picked was my preference too. However I could be swayed into the second of chaley's suggestions. I just think with the approach I have taken the brain has to do a little more cross-referencing (particularly when stripping out the second row of titles). It is more concise in layout, but perhaps at the expense of ease of use. Last edited by kiwidude; 04-25-2011 at 07:57 AM. |
![]() |
![]() |
![]() |
#157 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v0.6 Beta
Changes in this release:
For the more technically minded (or interested) you can now find all the algorithms and test code/cases for them in "algorithms.py" in the zip file. You can run this yourself with "calibre-debug -e algorithms.py". So you can see the range of permutations I currently test for and those that I still expect not to be caught. In terms of the examples posted earlier on this thread, I think all of them can now be found by one algorithm or another, with the exception of this one: Foundation 5 - Foundation and Earth Foundation and Earth It will however find this: Foundation and Earth - Foundation 5 Foundation and Earth Of course it is pretty easy to do a sanity check on your library using "title:-" or the Quality Check plugin to detect such cases and fix them before you do your duplicate run. Look forward to hearing what you think. My todo list with this is now done - with the possible exception of some slightly improved tag browser Last edited by kiwidude; 04-25-2011 at 04:57 PM. Reason: Updated to 0.6.1 |
![]() |
![]() |
![]() |
#158 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
This plugin is fun to play with.
![]() Problem: exception using title:soundex, Author:ignore. Edit: it happens for all title != ignore. Code:
calibre, version 0.7.57 ERROR: Unhandled exception: <b>NameError</b>:global name 'not_duplicate_of_map' is not defined Traceback (most recent call last): File "calibre_plugins.find_duplicates.action", line 155, in toolbar_button_clicked File "calibre_plugins.find_duplicates.action", line 150, in find_duplicates File "calibre_plugins.find_duplicates.duplicates", line 79, in run_duplicate_check File "calibre_plugins.find_duplicates.algorithms", line 285, in run_duplicate_check File "calibre_plugins.find_duplicates.algorithms", line 321, in convert_candidates_to_groups File "calibre_plugins.find_duplicates.algorithms", line 383, in partition_using_exemptions NameError: global name 'not_duplicate_of_map' is not defined |
![]() |
![]() |
![]() |
#159 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Oops - didn't do a very good job of pasting in your code now, did I? lol.
New version updated on the previous post. |
![]() |
![]() |
![]() |
#160 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
This is really a lot of fun. I tried soundex title, ignore author, and it finds very surprising things. I have a lot of books in French, and it matches these against the English titles (no surprise). I actually found 2 new duplicates. However, I am mystified why "20,000 Leagues under the Sea" matches "L'Assassin du roi".
![]() @kiwidude: this is really good stuff. I suggest that you make it generally available as soon as you are comfortable with doing so. |
![]() |
![]() |
![]() |
#161 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Yes it does have an element of playing the pokies about it - you pick a combination and pull the handle to see what turns up
![]() The soundex is one for sure that may need some refining. I had to tweak the algorithm off that link - it blew up on titles with non ascii characters in the names so now I ignore those. There is also the question of what length to make the soundex - too short and your buckets are too big, too long and it might not be fuzzy enough. As a starting point I chose a length of 6 for titles and 8 for authors but these were relatively arbitrary based on some random sampling. You could potentially expose this on the duplicate options dialog I guess if you wanted to allow users to tune to their liking? I guess it depends on how much control we want to offer if any. When soundex is applied to authors, I try to apply to the surname first and then the rest. So if you had "Robert Cross" and "Robert Ludlum" they shouldn't appear together from a soundex match, but "Nora Roberts" and "N. Roberts" would. |
![]() |
![]() |
![]() |
#162 | ||
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
leaguesunderthesea lassassinduroi The algorithm is designed to discard repeating mapped letters with the same "value" for that character. e and a have the same value, so do s and g. etc. etc - and you end up with a soundex match unless we crank up the soundex length ![]() Quote:
At the moment when you do an author based search, I expand the authors node in the tag browser and make it visible. The next step is to add to that to take the first author from the group under consideration and ensure that node is visible in the tag browser. Otherwise (when you have a lot of authors) you have all those collapsed author groups and still have to do a bit of hunting to actually get to the author node. The other aspect to that is would some users get grumpy about the tag browser continually popping into view? Perhaps I should add a checkbox on the search options dialog so users could choose not to use that function (like when working on small screens). Although if they need to rename an author it is the "best" way of doing so though I guess they could do it from the bulk metadata dialog. The third aspect of that is if I made it an option to then support it for title based searches as well. While it would be slightly less used for renaming authors with those searches it still could be. If I can figure out those questions (plus whether to offer a soundex spinbox) then will stick it out as a 1.0 plugin. |
||
![]() |
![]() |
![]() |
#163 | |||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Quote:
1) Those that don't understand the tag browser and can use some help. They won't change the box. Having the browser select the author might help them understand what one can do with the browser, although it will also be another example of mysterious behavior. 2) Those that understand and use the tag browser. The integration will help. 3) Those that understand but do not use the tag browser. Having it pop open would be a curse. I also wonder about how you do the positioning. Should you just find the node and position on it, or should you enter something into the tag browser search box and trigger a search? My guess is that the former is what you should do, because a) another find won't find anything different, and b) probably the *vast* majority of people have no clue what that box is for and wouldn't notice the text in it. Quote:
My guess is that an option is required. If title is set to Ignore, the option defaults 'on'. If title is set to anything else, the option defaults 'off'. Do you remember the settings of these options? The tab browser checkbox would be a good candidate for remembering. If someone doesn't want it, s/he probably *really* doesn't want it. |
|||
![]() |
![]() |
![]() |
#164 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Hi Charles, thx for your feedback again.
Yeah I was thinking of just boxing the node to ensure it is visible, no searches. In a similar way to what my user category plugin does. So it would be a call to "self.gui.tags_view.model().find_item_node()" then "show_item_at_path()" or something like that. All settings are persisted in the config file as soon as the user clicks ok in the dialog so yes everything is "remembered". Sounds like we agree the user needs an option to disable the tag browser integration. It is just how that would interoperate with changes to the type of title search that I'm not clear about. The simplest option would be a single checkbox that says "Show the first author in each group in the tag browser". This would take effect regardless of the type of search. The second option would be to name it "Show the first author in the tag browser for ignore title searches". As per the name, doing anything but an author duplicate search would never manipulate the tag browser. The third option would be to offer two checkboxes. One for "ignore title searches" and one for "title searches". We could enable/disable the appropriate checkboxes as the user switches the title match radio buttons so the user knows which is relevant. I can't think of how a setting which changes each time you click a radio button can sucessfully work in combination with a "remembered" setting without the user wondering wtf is going on? ![]() What did you think of the soundex - shall we let the user tune it with spinboxes or not? |
![]() |
![]() |
![]() |
#165 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
After more thought I propose the following:
My original requirement was the most likely scenario of a user doing an ignore title based search (so looking for duplicate authors) and on finding a group wanting to rename the authors involved. Forget about tag browser authors for title based searches. Yes it is "possible" a duplicate author could be found that way but the workflow I would always recommend is to do your "ignore title" searches first which should ensure any author renaming is already handled. Then focus on looking for various title matches with identical authors. So... a checkbox related to just author based searches should do the trick of "Highlight authors in tag browser for ignore title searches". I then decided to draw boxes around *all* the authors in the current group in the tag browser. I think that works REALLY well to give visual focus to just the authors under consideration for this group. Obviously it doesn't apply to when viewing one group at a time, as all the authors would be boxed. Though I think it appropriate to have the tag browser visible and authors node expanded ready . One decision left... should I offer people want the ability to tune the fuzziness of their soundex matches? |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Duplicate Detection | Philosopher | Library Management | 114 | 09-08-2022 07:03 PM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |
Duplicate Detection | albill | Calibre | 2 | 10-26-2010 02:21 PM |
New Plugin Type Idea: Library Plugin | cgranade | Plugins | 3 | 09-15-2010 12:11 PM |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |