Worth noting that Counter itself uses what the article calls get Method, but with a common performance optimization (caching a bound method).
def _count_elements(mapping, iterable):
mapping_get = mapping.get
for elem in iterable:
mapping[elem] = mapping_get(elem, 0) + 1
The author seems to misunderstand one part of The Zen of Python:
| Simple is better than complex.
by saying:
> Our code is more complex (O(n2) instead of O(n)), less beautiful, and less readable
The Zen is not about computation complexity! It's about complexity of the source code.
The code in question is:
color_counts = dict((c, colors.count(c)) for c in set(colors))
And while I agree that this is inefficient and shouldn't be used in a library, I find this to be very readable and would always prefer that code if I know that I'm dealing only with small lists. It translates neatly to a natural-language description of the problem: Give me a dictionary that maps each distinct color
to the number of times it occurs in the list.
So I don't think this violates the guide "Simple is better than complex". Rather, it is a good example where it makes sense to introduce additional complexity to improve the performance of an often-used helper function.The try block method would be considered bad in most languages, and I hope it is considered bad in Python as well. Using exception handling as part of a normal flow of control is bad style, bad taste, and bad for performance.
EDIT: I'm glad to see the Python documentation addresses this: https://docs.python.org/2/faq/design.html#how-fast-are-excep...
> Per the Zen of Python, “there should be one– and preferably only one– obvious way to do it”. This is an aspirational message.
"aspirational" is an optimistic way to describe it. As the article illustrates, every ~2 years the "pythonic" approach is different. Innovation is good, but it hurts re-use of code and of skills.
I was expecting a larger difference in times, about 3.5x is the largest spread. https://gist.github.com/treyhunner/0987601f960a5617a1be
Nice article.
This makes me appreciate autovivification and casting in perl so that you can just say "$color_counts{$color} += 1" without all the initialization.
$ txr
This is the TXR Lisp interactive listener of TXR 123.
Use the :quit command or type Ctrl-D on empty line to exit.
1> [hash-update [group-by identity
'(brown red green yellow yellow
brown brown black)]
length]
#H(() (green 1) (red 1) (brown 3) (black 1) (yellow 2))
Form a hash by grouping like items into lists. The identity function is the key in the hash and the basis for equality, so the keys are colors, and the values are lists of colors.Then update the hash values by filtering through the length function.
Would be interesting doing the same exercise for Java as well. We've been through Hashtable, HashMap, basic for loops, Enumeration, Iterator, foreach loops, generics, and now we have lambdas.
just for fun...
import itertools
colors = ["brown", "red", "green", "yellow", "yellow", "brown", "brown", "black"]
dict([(color, len(list(grp))) for color, grp in itertools.groupby(sorted(colors))])
or dict([(color, len(filter(lambda c: c==color, colors))) for color in set(colors)])
...because sometimes job security is important too.I thought this was going to end around 2.4 with the .count(c) method, since you don't really need the dict at that point.
collections.Counter() was the slowest option according to the benchmark from 2010 http://stackoverflow.com/questions/2522152/python-is-a-dicti...
Great blog post Trey!
very cool
I like History Thank for history information!!!
What a beautiful, understated piece of writing. This particular problem seems to come up a lot when thinking about how to use new Python classes (defaultdict, Counter) and affordances (list comprehensions, +=). It's nice to see it captured in timeline format.