Grand Sorcerer
Posts: 12,450
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Originally Posted by kiwidude
@chaley - actually I think I found another issue with that algorithm. I don't it actually "works".
For instance if I set
dups = [(3,4),(3,5)]
initial_dups=[1,2,3,4,5,6]
The results it gives me are:
[1,2,3,4,5],[1,2,4,5,6]
Look at the first group - it has 3 and 4 together. Yet they are specifically exempted from appearing together in a group, and instead 6 has been removed?
|
The problem you found has to do with how the non-dup sets were built. The code assumed that the sets were complete, but in your example they are not, and so needed to be 'unioned'. Changing the dup_map building code to do that (1 character), the partitioning algorithm produces the right answer.
Code:
books known not to be duplicates [(3, 4), (3, 5)]
candidate duplicates [1, 2, 3, 4, 5, 6]
After partioning [[1, 2, 3, 6], [1, 2, 4, 5, 6]]
That said, the partitioning algorithm I used was stupid. Although you have already done your own and therefore don't need this, for completeness here is a much better one that avoids the highly exponential behavior of the last one. It is still not linear, but I don't think it can be because of the nature of partitioning. This code also contains the one-character change to correctly build the duplicate sets.
Spoiler:
Code:
from collections import defaultdict
# Construct map of books that are not duplicates
dups = [(3,4), (3,5), (3, 6), (7, 8), (8, 9)]
print 'books known not to be duplicates', dups
not_duplicate_of_map = defaultdict(set)
for t in dups:
s = set(t)
for b in t:
not_duplicate_of_map[b] |= s
# Simulate a test
initial_dups = [i for i in xrange(1,10)]
initial_dups.sort()
print 'candidate duplicates', initial_dups
# Initial condition -- the group contains 1 set of all elements
results = [set(initial_dups)]
partitioning_ids = [None]
# Loop through the set of duplicates, checking to see if the entry is in a non-dup set
for one_dup in initial_dups:
if one_dup in not_duplicate_of_map:
# The entry is indeed in a non-dup set. We may need to partition
for i,res in enumerate(results):
if one_dup in res:
# This result group contains the item with a non-dup set. If the item
# was the one that caused this result group to partition in the first place,
# then we must not partition again or we will make subsets of the group
# that split this partition off. Consider a group of (1,2,3,4) and
# non-dups of [(1,2), (2,3)]. The first partition will give us (1,3,4)
# and (2,3,4). Later when we discover (2,3), if we partition (2,3,4)
# again, we will end up with (2,4) and (3,4), but (3,4) is a subset
# of (1,3,4). All we need to do is remove 3 from the (2,3,4) partition.
if one_dup == partitioning_ids[i]:
results[i] = (res - not_duplicate_of_map[one_dup]) | set([one_dup])
continue
# Must partition. We already have one partition, the one in our hand.
# Remove the dups from it, then create new partitions for each of the dups.
results[i] = (res - not_duplicate_of_map[one_dup]) | set([one_dup])
for nd in not_duplicate_of_map[one_dup]:
# Only partition if the duplicate is larger than the one we are looking
# at. This is necessary because the non-dup set map is complete,
# map[2] == (2,3), and map[3] == (2,3). We know that when processing
# the set for 3, we have already done the work for the element 2.
if nd > one_dup and nd in res:
results.append((res - not_duplicate_of_map[one_dup]) | set([nd]))
partitioning_ids.append(nd)
print 'After partioning'
sr = []
for r in results:
if len(r) > 1:
sr.append(sorted(list(r)))
sr.sort()
for r in sr:
print r
|