View Single Post
Old 05-01-2011, 06:12 AM   #236
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,467
Karma: 8025600
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kiwidude View Post
The authors point is an interesting one. One issue which I didn't see you mention of having multiple author cache entries is replication of groups.
You are right. Hmmm...
Quote:
I guess the question is whether this is a problem or not. If the user resolves their duplicates in order, the second group if identical would disappear automatically. If they skip through them with highlighting it may jump around a bit but still be valid.
Thinking on paper and (I think) agreeing with you: what you are saying is that adding a book to multiple buckets can create a situation where one group is a (possibly improper) subset of another. It seems to me that there isn't much point in showing both groups, at least in author mode. For example, why show a group containing (1,2,3) and another containing (2,3)?

Subsets can be removed rather easily, with performance that should be acceptable if there aren't thousands of groups. Something like this:
Spoiler:
Code:
def clean_dup_groups(dups):
    res = [set(d) for d in dups]
    res.sort(cmp=lambda x, y: cmp(len(x), len(y)))
    ans = []
    for i,a in enumerate(res):
        for b in res[i+1:]:
            if a.issubset(b):
                break
        else:
            ans.append(a)
    return ans


dups = [(1,2,3),(4,5)]
print dups
print clean_dup_groups(dups)

print '========================'
dups = [(1,2,3,4,5), (1,6,7)]
print dups
print clean_dup_groups(dups)

print '========================'
dups = [(1,2,3,4,5), (1,6,7), (1,6,7)]
print dups
print clean_dup_groups(dups)

print '========================'
dups = [(1,2,3,4,5), (1,6,7), (3,4), (6,7)]
print dups
print clean_dup_groups(dups)


with output:
[(1, 2, 3), (4, 5)]
[set([4, 5]), set([1, 2, 3])]
========================
[(1, 2, 3, 4, 5), (1, 6, 7)]
[set([1, 6, 7]), set([1, 2, 3, 4, 5])]
========================
[(1, 2, 3, 4, 5), (1, 6, 7), (1, 6, 7)]
[set([1, 6, 7]), set([1, 2, 3, 4, 5])]
========================
[(1, 2, 3, 4, 5), (1, 6, 7), (3, 4), (6, 7)]
[set([1, 6, 7]), set([1, 2, 3, 4, 5])]

Quote:
If they added exemptions using mark all groups it would create some duplication I think but not a major drama.
Again, I don't see a reason to keep exemption groups that are subsets of another group. The same set cleanup would fix this, eliminating the subsets.
chaley is offline   Reply With Quote