Quote:
Originally Posted by kiwidude
I guess the question is that memory vs performance tradeoff and whether to just go ahead and build dup group maps on the fly. That is not the only place the exemption map is used of course as I have for instance menu items enabled based on whether a book has exemptions, along with the show all exemptions stuff. So these would need to change to use the build on the fly stuff too.
|
You can hide the on-demand part by creating a subclass of dict and adding a method to return the set. Just for fun, an example is below the spoiler.
Spoiler:
Code:
class ExemptionMap (defaultdict):
def __init__(self, default_factory):
defaultdict.__init__(self, default_factory)
def merge_sets(self, key):
list_of_sets = self.get(key, [])
if len(list_of_sets) == 0:
return set()
if len(list_of_sets) == 1:
return list_of_sets[0]
return set().union(*list_of_sets)
The exemption map would be built as in the code above.
Code:
not_duplicate_of_map = ExemptionMap(list)
for t in dups:
s = set(t)
for b in t:
not_duplicate_of_map[b].append(s)
The map would be used as below.
Code:
ndm_entry = not_duplicate_of_map.merge_sets(one_dup)
Quote:
I think 1000 members of a group is a reasonable limit anyways? To get that many members you must have either a really crappy set of book metadata like all the titles unknown, or be storing something like magazines where the majority of the name is identical. It is trivial for a user to exclude magazines or whatever by using a search restriction if they don't want them to appear in the results of a soundex or fuzzy search.
|
Not convinced. I can see someone doing a soundex search, getting 4000 matches in the group, looking through them, and deciding that they are all non_dups. Highlight them all and bang! add the exemption.
Don't add the limit until we have an idea what the performance might be. I think it will be acceptable, even with large exemption sets.