04-19-2011, 07:19 AM | #106 | ||||
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
You might also want to hook into gui.search_restriction.currentIndexChanged(int), so you will know if the user changes the restriction. Quote:
Quote:
Doesn't this depend on my mental model at the time? If I am really looking for duplicate books and choose 'only author', then I might want to use groups and highlighting. Yes, this is a very fuzzy search, but that is what I asked for. However, if my mental model is 'looking for author problems', then yes, I want to use group-at-a-time mode, and play with the tag browser etc. I also might want to use group-a-a-t mode if I am checking other metadata such as series or tagging issues. What is wrong with adding an 'ignore title, fuzzy author' combination and leaving it up to me to choose how I want to see them? You might want to change the default to g-a-a-t, but I don't think you should prevent me from using highlighting. Quote:
|
||||
04-19-2011, 07:40 AM | #107 | ||
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Quote:
|
||
Advert | |
|
04-19-2011, 08:03 AM | #108 | |
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
After some thought, I think you would be better served to hook into activated instead of currentIndexChanged. If the user chooses index 1 (current search), activated will fire and the restriction will change to the current search. However, currentIndexChanged will not fire, because the index didn't change. I don't know if you should unhook/rehook, or if you should have an 'ignore' flag. By the latter I mean using a double-signal arrangement, something like: Code:
def do_search_restriction_activated(self, idx): if not ignoring_signals: self.restriction_changed.emit() def do_restriction_changed(self): do what you need to do pyqtSignal restriction_changed() def __init__() self.ignoring_signals = False gui.search_restriction.activated[int].connect(self.do_search_restriction_activated) self.restriction_changed.connect(self.do_restriction_changed) FWIW |
|
04-19-2011, 08:20 AM | #109 | ||
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Say I do a search for "dragon", then choose "*Current search" in restriction dropdown. In the dropdown my only options now are blank and "dragon" (plus any other saved searches obviously). The *Current search option has been replaced with dragon. Now in the restriction dropdown if I choose "dragon" in actual fact that clears the restrictions and puts "*Current search" in it's place. An alternative suggestion (since it is only fair if I say something I found unexpected is to suggest an alternative, however crap it may be) is: (a) *Current search always stayed there (b) a new entry of *dragon appeared below *Current search So if a user chooses *dragon again in the dropdown, nothing happens. If a user has typed another search and they want to immediately apply it, they can choose *Current search. They dont have to first drop the restriction and then retype the search (as will happen if they forgot the restriction was on) My final comment - could the tooltip for the restriction dropdown show the full text of the current search when you have one selected. As it gets shortened in the non-resizable dropdown to illegibility Just my 2p. I don't really care that much, honest Quote:
|
||
04-19-2011, 09:34 AM | #110 | |||
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
I don't see a need to clear that search if I select *current search with an empty search box. Instead I select index 0, which clears the restriction. Quote:
Quote:
|
|||
Advert | |
|
04-19-2011, 09:59 AM | #111 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I have lots of similar cases. In my pre-Calibre days, when I relied on filename/folder indexing, I named my ebook files with only a single author, and I made multiple copies of each book, when I had multiple authors. That way I could find the book under each author. When I imported those books into Calibre, each multiple-author book came in multiple times with single authors. I've been slowly merging them and fixing the authorship. Quote:
|
||
04-19-2011, 10:08 AM | #112 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
That all sounds great thanks. One more related question. Is there a robust way for me to know if the user has a custom versus a saved search selected at the time they find duplicates? As I have to call a different function to restore the value so I need to know which it is. My hack I use currently will break with your change.
Excellent point on the alternative place to hook the event from thanks. |
04-19-2011, 10:33 AM | #113 | |
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Are you going to restore the search even if it wasn't active? My feeling is that you don't need to. In fact, after some reflection I wonder if you need to restore the restriction at all. The user can easily do that manually if desired. |
|
04-19-2011, 10:42 AM | #114 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Typing on my phone so apologies if my wording confused. I never restore a search but I do restore the restriction which might have been from a custom search. Your suggested code is what I would have done but wanted to make sure I wasn't missing something. It would be too weird for the clear search button to actually result in a search
|
04-19-2011, 12:44 PM | #115 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v0.4.2 Beta
This hopefully fixes two issues without introducing new ones:
The first item is as per previous posts - you can either manually change a search restriction or click the clear button in the gui to exit search mode. I've made this change in such a way (temporarily) that it should work for people both running 0.7.56 and running from source with Charles's changes to the restrictions dropdown today. Last edited by kiwidude; 04-22-2011 at 05:59 AM. Reason: Removed attachment as later version in thread |
04-20-2011, 10:01 AM | #116 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v 0.5 Beta - How fuzzy wuzzy wuzza duck?
Ok, so I have an implementation put together for supporting "author duplicate" (ignore title) searches. And it seems to actually work without having to completely start all over again, which is both surprising and gratifying.
My plan was to add the following algorithms: - ignore title, similar author - ignore title, fuzzy author However having implemented the first to reuse the same "similar author" logic that I am using for "similar title, similar author" I noticed some unexpected fuzziness Specifically, for my initial implementation of "similar author" for this plugin to get up and running I decided just to invoke Kovid's author simplifying algorithm used for metadata retrieval (in ebooks/metadata/sources/base.py in the Source class of get_author_tokens()). What I found however is that I think it is a bit too fuzzy/aggressive for a "similar" author search. Specifically what it does that goes across my personal desire for "similar" is that it removes initials. So for example "J. Smith" becomes "Smith" and would match with "W. Smith" in a duplicate search. Which brings the question of how fuzzy wuzzy does each algorithm go So - my suggestion is that "similar authors" will use the same logic as get_author_tokens, but not strip initials. So that will be left with handling removing punctuation, different spacing and reversal of names like LN,FN to FN LN. Then the "fuzzy authors" algorithm, would be left to be more aggressive. Either it could attempt to determine a "last name" and ignore everything else (and yes I know there are lots of issues with determining the "last" name with Jr. etc but we could if wanted attempt to cater for some common cases). Or slightly more usefully it could take the last name and prefix it with one initial, being either the first letter of the first name or first initial, whatever is found. So W. Smith / Wayne Smith / Smith, W. would all match under either fuzzy proposal. However W. Smith / S. Smith would not return as a match under the second. Or perhaps you have different ideas for "similar" and "fuzzy". What are your thoughts? The attached plugin version has no changes to the "similar" logic so you can see for yourself. Other changes I made to support ignore title logic:
As per usual I may have accidentally introduced some new quirks with this version, but I really wanted to get something out there for feedback so your patience and understanding is appreciated . Last edited by kiwidude; 04-25-2011 at 02:16 PM. Reason: Removed attachment as later version in this thread |
04-20-2011, 10:26 AM | #117 | ||
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
I thought fuzzy wuzzy was a bear, not a duck.
Quote:
Quote:
Similar: same name ignoring punctuation and word order Fuzzy, alt 1: Strip initials (one letter words?). Match what is left. Fuzzy, alt 2: At least one word matches (how long must the word be?) The first letters of other words must match. Note that using this algorithm, I think that Sam Wayne would match Wayne Smith. I don't see how you can avoid this, unless you start attaching great meaning to commas. If I have this right, then I think I agree with you. Similar should be as described, which is very conservative. Least fuzzy should be alt 2. More fuzzy should be alt 1. You might consider inserting soundex between least fuzzy and more fuzzy. It should work reasonably well, at least for names that are pronounced reasonably correctly in English. Will try the plugin real-soon-now. |
||
04-20-2011, 10:38 AM | #118 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
When you go this fuzzy, bear == duck.
Sorry for the confusion, I'm just throwing ideas out there. Yes, I am saying that Similar should be the conservative approach of only removing punctuation and looking at word order. As to how many and what form the "fuzzy" algorithms take I welcome all input. Even better, write me and post a function for any proposal. One that takes an author name (well actually a list but we only consider the first author), and returns a string result representing the fuzzied result. My brain hurts a bit right now from twisting it though the permutations of author and book searches over the last few days (and goodreads metadata before that) so undoubtedly others will have better coding suggestions than I can conjure up in my current state. |
04-23-2011, 03:25 AM | #119 |
Member
Posts: 14
Karma: 10
Join Date: Sep 2010
Device: Kindle³
|
The current version still fails at detecting books like this:
Title Title: Subtitle Example: Brian Greene - The Hidden Reality Brian Greene - The Hidden Reality: Parallel Universes and the Deep Laws of the Cosmos I've used every option and still find manual duplicates. |
04-23-2011, 04:14 AM | #120 | |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Duplicate Detection | Philosopher | Library Management | 114 | 09-08-2022 07:03 PM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |
Duplicate Detection | albill | Calibre | 2 | 10-26-2010 02:21 PM |
New Plugin Type Idea: Library Plugin | cgranade | Plugins | 3 | 09-15-2010 12:11 PM |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |