Quote:
Originally Posted by chaley
One thing that has bitten me is using the 'in' operator on lists. The operator does a linear search! One piece of code I wrote improved in performance by two orders of magnitude when I changed the list to a set, which does hashed lookups. Sometimes I use a dict with a fixed value (e.g. True) for the same thing, because they are hashed as well.
|
Ahh, that is interesting. I had wondered why I had seen set being used so much in Calibre code, in particular in places where I knew the items being added were distinct so the "obvious" advantage of a set didn't stand out.
I've just sat down to take a look into this. You might be interested in the problem code actually - as it is based on your example from your post
here a while ago.
If you take your example and change your "initial_dups = [2, 3, 66, 7, 10, 11, 12]" to "initial_dups = [i for i in xrange(1,100)]" you will see it runs forever...
I'm sure there are optimisations that can be done to it, but it doesn't seem to scale well in its current guise.