Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 04-16-2011, 06:39 AM   #76
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,735
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Quote:
Originally Posted by chaley View Post
Given that the operation adds exemptions for all the books in a group, selections really don't have any meaning. So, unless you are intending to allow subsets of the group (are you?), then I think it is sufficient to pop up a question box to tell the user that exemptions will be added for all the books in the group and the selections will be ignored -- OK? If the user is confused, then I hope s/he pushes cancel, re-locates the group, selects nothing, and does it again. Of course, you must tolerate the first book of a group being selected, or (probably better) any one book in the group.
Definitely not thinking subgroups. All I was thinking is that the selection could happen to be be "anything" when they choose "Mark group as exempt". In a perfect world they just have one or more rows selected, and they all sit within the current group that will have exemptions made from it. No ambiguity. However what if their selection also happened to overlap into another group? Do I just check the first row in the selection lies within the current group, do I check all selected rows must lie in the group or just that one of them does? That was the sort of question I avoided answering

Also for instance a user might move the selection off the current group (when showing all groups) just so they can more clearly see all its members (since we have the issue of the selection colour overwriting the green highlighting). So are they getting an additional dialog box to tell them they are not on the group to be marked exempt, or are we saying that we just change the message on the existing dialog (that you want me to give the user a don't show me this again option on so then they wouldn't see) ...
Quote:
Putting aside the above concern, I am not convinced that sliders are the right interface. They imply a level of 'analog' behavior that isn't there, and also don't support tool tips and the like well.
I'm not sure I agree on that as I think "analog" with tickmarks and distinct labelled positions in combination with the actual values I proposed do actually represent a scale of directness of matching. However like I said the Qt sliders are pretty crap, and trying to line up centered labels in a grid next to them etc won't work very well.
Quote:
I would lean toward radio buttons, with two groups. Group 1 would have ISBN, then the title choices, with the first choice being ignore. Group 2 would have the author choices with the first choice being 'ignore', which would line up horizontally with the title group's ignore (nothing beside the ISBN choice). Choosing ISBN would force group 2 to ignore and disable it. Choosing any title option would enable group 2. Choosing ignore for both options can be an error, or can make one big group.
I think I need a picture sorry

I'm not intending to change it at this point, your suggestion of getting wider feedback is valid. It is just that I wanted to add a fuzzier author & title algorithms to make this plugin more useful. However adding just a single "fuzzy title, fuzzy author" option might bring back way too many false positives. Maybe "similar title, fuzzy author" and "fuzzy title, similar author" would be the most useful variants of that.

Last edited by kiwidude; 04-16-2011 at 06:48 AM.
kiwidude is offline   Reply With Quote
Old 04-16-2011, 07:33 AM   #77
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123457
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by kiwidude View Post
I'm not intending to change it at this point, your suggestion of getting wider feedback is valid. It is just that I wanted to add a fuzzier author & title algorithms to make this plugin more useful. However adding just a single "fuzzy title, fuzzy author" option might bring back way too many false positives. Maybe "similar title, fuzzy author" and "fuzzy title, similar author" would be the most useful variants of that.
You might be able to get around this by adding some ability to find fuzzily similar authors so that users can fix them to be all the same author. This way a user can fix up all their author records first, then do fuzzy/fuzzier title with exact author as a second pass.

Common criteria are spaces existing/not existing between initials. Initials being dropped or listed fully full names, author sort and authors reversed. It seems like it would be best suited to this plugin, as you could use all the same logic you're using for duped books and just make larger groups by author.

I know for myself I actually get more annoyed by my authors being messed up than by duplicate books. I keep dupes around all the time and just flag the poorer versions with a tag rather than merging/deleting them (crap books are good test candidates for heuristics), but it annoys me to no end trying to find all the messed up variants of the same author, which is basically a different variant of the duped books problem.
ldolse is offline   Reply With Quote
Old 04-16-2011, 07:46 AM   #78
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,735
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@Idolse, that is an interesting idea. I did a few days ago briefly think about some kind of "duplicate author variations" function in the plugin, but put it back in the mind because of the issues of the library view being only suitable for showing books not aggregations.

Perhaps it would be better to treat this situation as its own special case - with its own dialog and resolution approach. Certainly I will hold off doing anything related to it until I get more time to think through some approaches. It may be that we can fit it into the existing search approach or that it requires something different. You would certainly need to have an equivalent of "ignore title, fuzzy author" so that you could catch where two variations of the same author each have a distinct set of books in your library that none of the existing algorithms would catch
kiwidude is offline   Reply With Quote
Old 04-16-2011, 07:56 AM   #79
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123457
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by kiwidude View Post
Perhaps it would be better to treat this situation as its own special case - with its own dialog and resolution approach. Certainly I will hold off doing anything related to it until I get more time to think through some approaches. It may be that we can fit it into the existing search approach or that it requires something different. You would certainly need to have an equivalent of "ignore title, fuzzy author" so that you could catch where two variations of the same author each have a distinct set of books in your library that none of the existing algorithms would catch
Yeah, I realized there was one aspect that does indeed make it special - in the case where all the author records are identical (no different fuzzy matches) it's not a dupe (unlike a book), so those authors would need to not be marked. Probably not an insurmountable problem though, not sure if a different GUI would be required. But agree that getting the basic functions/feedback is a better priority right now - probably best just to leave the 'fuzzier' considerations alone for the first cut.
ldolse is offline   Reply With Quote
Old 04-16-2011, 08:34 AM   #80
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,735
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Quote:
Originally Posted by ldolse View Post
Yeah, I realized there was one aspect that does indeed make it special - in the case where all the author records are identical (no different fuzzy matches) it's not a dupe (unlike a book), so those authors would need to not be marked. Probably not an insurmountable problem though, not sure if a different GUI would be required
I'm not entirely sure if we are talking about the same thing here or if you just had a typo. Say I have in my library these two books:

1. The Girl With the Dragon Tattoo - Stieg Larsson
2. The Girl With the Dragon Tattoo - S. Larsson

Finding these books is a duplicate book search. None of the algorithms to date will find this scenario.

Now say instead I have these two books

1. The Girl With the Dragon Tattoo - Stieg Larsson
2. The Girl Who Played With Fire - S. Larsson

This is a duplicate author search. None of the algorithms to date or proposed would detect this. This is why I said "ignore title, fuzzy author" would be the only way to get them together as a group.

Now if you wanted something that gave you less false positives, you would probably also want an "ignore title, similar author" to catch this situation (fuzzy author would also catch it but it might "bury" it in loads of results):
1. The Lord of the Rings - J.R.R. Tolkien
2. The Hobbit - J. R. R. Tolkien

So we have yet another permutation (you can see why I am tempted to treat title and author as independently set algorithms?)

Then you have to think about how are you going to resolve this scenario. For a start you will want to rename all instances of that author. Before you make that decision, you will want to check that they are indeed the same author. There are of course many genuine situations where J. Smith and J.L. Smith are different authors. So you would want all the books under each of those author names on screen to compare. An "ignore title, similar author" search would give you that. Though it may also (in the "ignore title, fuzzy author" case) give you a load of other authors too.

Another scenario
1. The Lord of the Rings - J.R.R. Tolkien
2. The Lord of the Rings - J. R. R. Tolkien

You would find this pairing with a "similar title, similar author" search we have in there currently. But on spotting it, you would again likely want to rename one of those author variations. Then perhaps run the whole duplicate search again.

The fuzzier that author match is, the more false positives you are going to get (but also the only way you will catch the genuine duplicates from variations in first name/initials).

This is rambling I know but perhaps it explains a little of the variations I think we (ultimately) need to cater for.
kiwidude is offline   Reply With Quote
Old 04-16-2011, 08:46 AM   #81
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,525
Karma: 8065948
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kiwidude View Post
I think I need a picture sorry
I think this is the slider you want? I did this with a vertical layout containing two horizontal layouts. The top layout has the labels, the bottom one the slider.
Click image for larger version

Name:	Clipboard01.png
Views:	800
Size:	12.0 KB
ID:	70000

This is what I meant by the radio buttons. EDIT: I forgot the 'ignore me' buttons.
Click image for larger version

Name:	Clipboard02.png
Views:	803
Size:	14.0 KB
ID:	70001

Last edited by chaley; 04-16-2011 at 08:53 AM.
chaley is offline   Reply With Quote
Old 04-16-2011, 08:50 AM   #82
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,735
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Quote:
Originally Posted by chaley View Post
I think this is the slider you want?...
Thanks Charles, you have been busy. You can see what I mean by the difficulty of lining up the tickmark labels nicely. Thanks for the radio buttons, as they say a picture is worth a thousand words.
kiwidude is offline   Reply With Quote
Old 04-16-2011, 08:50 AM   #83
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,525
Karma: 8065948
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kiwidude View Post
Definitely not thinking subgroups. All I was thinking is that the selection could happen to be be "anything" when they choose "Mark group as exempt". In a perfect world they just have one or more rows selected, and they all sit within the current group that will have exemptions made from it. No ambiguity. However what if their selection also happened to overlap into another group? Do I just check the first row in the selection lies within the current group, do I check all selected rows must lie in the group or just that one of them does? That was the sort of question I avoided answering
My opinion:

If all selections are in the group, show a dialog saying that the entire group will be added, not just the selected books.

If some selections are in a different group, show a dialog saying that the entire group will be added and that the selections outside the group will be ignored.

If only one book is selected, and if that book is in the group, then show a dialog saying that the entire group will be added.

Use three different ignore_me checkbox names.
chaley is offline   Reply With Quote
Old 04-16-2011, 08:53 AM   #84
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,525
Karma: 8065948
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kiwidude View Post
@Idolse, that is an interesting idea. I did a few days ago briefly think about some kind of "duplicate author variations" function in the plugin, but put it back in the mind because of the issues of the library view being only suitable for showing books not aggregations.

Perhaps it would be better to treat this situation as its own special case - with its own dialog and resolution approach. Certainly I will hold off doing anything related to it until I get more time to think through some approaches. It may be that we can fit it into the existing search approach or that it requires something different. You would certainly need to have an equivalent of "ignore title, fuzzy author" so that you could catch where two variations of the same author each have a distinct set of books in your library that none of the existing algorithms would catch
If some algorithm can find them, then I can select the ones I want to change and do it with edit metadata. I don't see why it needs its own resolution approach.
chaley is offline   Reply With Quote
Old 04-16-2011, 08:55 AM   #85
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,525
Karma: 8065948
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kiwidude View Post
I'm not entirely sure if we are talking about the same thing here or if you just had a typo.
What you said in this post matches my mental model, FWIW

PS: Sorry about the rapid-fire posts. It is easier than trying to combine them all into one.
chaley is offline   Reply With Quote
Old 04-16-2011, 09:30 AM   #86
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123457
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I guess I wasn't really clear..., no worries.

What I meant was I agree the algorithm would be ignore title, fuzzy author.

But in the case of 'books', we're actually looking for duplicates, so any set of duplicates in a particular groups is something you would mark and put in the results a user needs to sort through.

For merging authors if all the author variations are identical, e.g.:
1. The Lord of the Rings - J. R. R. Tolkien
2. The Two Towers - J. R. R. Tolkien
3. The Return of the King - J. R. R. Tolkien
4. The Hobbit - J. R. R. Tolkien

All these authors are exactly identical. So the algorithm will say they're duplicates, correct? In this case I'm not interested sorting through this match since they're all 'correct'. So they shouldn't be marked to show up in the results that user needs to go through.

However if your library has one variant that's not quite right:

1. The Lord of the Rings - J. R. R. Tolkien
2. The Two Towers - J. R. R. Tolkien
3. The Return of the King - J. R. R. Tolkien
4. The Hobbit - J. R. R. Tolkien
5. The Fellowship of the Ring - J.R.R. Tolkien

Then I have a group that should be marked, and since they're not exactly identical I actually want them to be displayed.

So what I'm saying is with Author searches perfect Dupes can and should be ignored, which is different from dupes focusing on books themselves. Basically the intended outcome is inverted - with authors I'm trying to make more dupes, with books I'm trying to make less.

Did I make any more sense, or did I miss a point earlier which makes this moot?


Agree with Chaley, about the resolution with edit metadata, that's what I was thinking as well.

Last edited by ldolse; 04-16-2011 at 09:34 AM.
ldolse is offline   Reply With Quote
Old 04-16-2011, 09:33 AM   #87
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,735
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Quote:
Originally Posted by chaley View Post
What you said in this post matches my mental model, FWIW

PS: Sorry about the rapid-fire posts. It is easier than trying to combine them all into one.
Thanks Charles and no problem - individual responses are easier than wading through a range of stuff like I sometimes waffle on about.

So to continue my trend exactly on that point - thx for the suggestions on the selection dialogs and variations, I'll look to make that change.

As for the "not needing a different resolution" for duplicate authors, obviously my follow up post was my attempt to think out loud further on the topic. If we allow the user to do "ignore title" searches, then they will have the "ability" to do what are in effect duplicate author searches. My question I raised over this was whether that actually was the "best" way to approach the problem.

Functionally it will help do the job, and it ticks a number of the boxes for what you will need to do such as seeing all the books for that author to make a decision. However it will mean you end up creating potentially masses of rows of duplicate exemptions versus perhaps an alternative implementation allowing exempting author combinations instead (which would also prevent future books from those authors appearing).

I will also add author to the sorting of results (since you will want title within author within marked).

With a standalone duplicates tool I wrote a while ago it did several passes through the data running various types of algorithms. So I could consider and attack the low hanging fruit first of the "almost certain it is a duplicate" cases, then effectively re-run again to find the "less likely to be a duplicate" cases etc. From a workflow perspective I first considered my report of potential author duplicates - because if you got the authors renamed correctly then it meant that subsequently running a title based comparison with exact author you were down to just genuine merges required.

So in this author based report of my Calibre library, I could quickly just scan down a list of author names. Once I found a "suspicious" new combination then in Calibre I would do an or based search for those authors, and lookup the authors on FantasticFiction with Search the Internet to verify all the books did appear under a single author name.

None of this is precluded from being done with the plugin (assuming we add more flexible title/author permutations). It just may involve a lot more scrolling, a lot more "noise" and a lot more duplicate exemptions. Plus you have to consider how you will actually do the renaming as well - finding the author in the tag browser, drilling down in to the author then doing a bulk edit or whatever. No real harm and of course you could refine things a bit with using the search box or tag browser but it is a slightly different variation mentally in approaching the resolution. If instead there was a more author-centric view of the results in this scenario, it could make some of this a few more steps.

As a first cut what we have is fine bar the lack of support for author based duplicate searches imho.
kiwidude is offline   Reply With Quote
Old 04-16-2011, 09:42 AM   #88
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,735
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@Idolse - your latest post #86 has brought up an interesting point I hadn't thought about in terms of the implementation. An ignore title based search would need some changes to the internal logic in a few places.

Currently a duplicate group is considered as one that has more than one book in it. That works fine for all our current algorithms because they are all book based searches.

However author based will need different logic. Because you are actually only as you say interested in groups of results which have more than one author left in them. So you need to scan across the books and check that author before you can decide that the group is no longer relevant. That happens in a few places in the code - both when initially presenting the duplicate groups and when you move to the next result (to cater for books that get deleted/merged etc).

Hmmm....
kiwidude is offline   Reply With Quote
Old 04-16-2011, 09:52 AM   #89
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123457
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by kiwidude View Post
@Idolse - your latest post #86 has brought up an interesting point I hadn't thought about in terms of the implementation. An ignore title based search would need some changes to the internal logic in a few places.

Currently a duplicate group is considered as one that has more than one book in it. That works fine for all our current algorithms because they are all book based searches.

However author based will need different logic. Because you are actually only as you say interested in groups of results which have more than one author left in them. So you need to scan across the books and check that author before you can decide that the group is no longer relevant. That happens in a few places in the code - both when initially presenting the duplicate groups and when you move to the next result (to cater for books that get deleted/merged etc).

Hmmm....
Right - not sure how much more complicated that makes things. Anyway I think for this type of author search I don't think you need to worry about special handling for books being deleted/merged. While a user could certainly do it while those results are being displayed, a merge/delete isn't the goal in this case, just changing the metadata. Not sure if that helps any.
ldolse is offline   Reply With Quote
Old 04-16-2011, 10:04 AM   #90
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,735
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Quote:
Originally Posted by ldolse View Post
Right - not sure how much more complicated that makes things. Anyway I think for this type of author search I don't think you need to worry about special handling for books being deleted/merged. While a user could certainly do it while those results are being displayed, a merge/delete isn't the goal in this case, just changing the metadata. Not sure if that helps any.
Its not a major deal but it is something I need to consider.

In fact I think there is another bigger issue and that is duplicate exemptions, which again are book based. If the purpose of a duplicate group for resolution purposes is to show all books for authors that have duplicates, then you won't want other books by those authors exempted out of it (or the groups partitioned because of it).

This all just confirms my thoughts that author based duplicate searching has to be considered "separately" in terms of the impact it has and perhaps how it should be presented.
kiwidude is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicate Detection Philosopher Library Management 114 09-08-2022 07:03 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM
Duplicate Detection albill Calibre 2 10-26-2010 02:21 PM
New Plugin Type Idea: Library Plugin cgranade Plugins 3 09-15-2010 12:11 PM
Help with Chapter detection ubergeeksov Calibre 0 09-02-2010 04:56 AM


All times are GMT -4. The time now is 10:05 AM.


MobileRead.com is a privately owned, operated and funded community.