Hacker News

Counting Things in Python: A History

by thon 11/10/2015, 12:27:00 AM with 15 comments

by mturmonon 11/10/2015, 1:12:56 AM
What a beautiful, understated piece of writing. This particular problem seems to come up a lot when thinking about how to use new Python classes (defaultdict, Counter) and affordances (list comprehensions, +=). It's nice to see it captured in timeline format.
by mgraczykon 11/10/2015, 2:06:22 AM
Worth noting that Counter itself uses what the article calls get Method, but with a common performance optimization (caching a bound method).
```
    def _count_elements(mapping, iterable):
        mapping_get = mapping.get
        for elem in iterable:
            mapping[elem] = mapping_get(elem, 0) + 1
```
by vogon 11/10/2015, 7:02:52 AM
The author seems to misunderstand one part of The Zen of Python:
| Simple is better than complex.
by saying:
> Our code is more complex (O(n2) instead of O(n)), less beautiful, and less readable
The Zen is not about computation complexity! It's about complexity of the source code.
The code in question is:
```
    color_counts = dict((c, colors.count(c)) for c in set(colors))
```
And while I agree that this is inefficient and shouldn't be used in a library, I find this to be very readable and would always prefer that code if I know that I'm dealing only with small lists. It translates neatly to a natural-language description of the problem:
```
    Give me a dictionary that maps each distinct color
    to the number of times it occurs in the list.
```
So I don't think this violates the guide "Simple is better than complex". Rather, it is a good example where it makes sense to introduce additional complexity to improve the performance of an often-used helper function.
by twblalockon 11/10/2015, 1:48:16 AM
The try block method would be considered bad in most languages, and I hope it is considered bad in Python as well. Using exception handling as part of a normal flow of control is bad style, bad taste, and bad for performance.
EDIT: I'm glad to see the Python documentation addresses this: https://docs.python.org/2/faq/design.html#how-fast-are-excep...
by qewrffewqwfqewon 11/10/2015, 1:55:30 AM
> Per the Zen of Python, “there should be one– and preferably only one– obvious way to do it”. This is an aspirational message.
"aspirational" is an optimistic way to describe it. As the article illustrates, every ~2 years the "pythonic" approach is different. Innovation is good, but it hurts re-use of code and of skills.
by mkjon 11/10/2015, 2:11:11 AM
I was expecting a larger difference in times, about 3.5x is the largest spread. https://gist.github.com/treyhunner/0987601f960a5617a1be
Nice article.
by buzzdenveron 11/10/2015, 6:45:53 AM
This makes me appreciate autovivification and casting in perl so that you can just say "$color_counts{$color} += 1" without all the initialization.

by kazinatoron 11/10/2015, 2:33:25 AM

  $ txr
  This is the TXR Lisp interactive listener of TXR 123.
  Use the :quit command or type Ctrl-D on empty line to exit.
  1> [hash-update [group-by identity
                            '(brown red green yellow yellow
                              brown brown black)]
                  length]
  #H(() (green 1) (red 1) (brown 3) (black 1) (yellow 2))

Form a hash by grouping like items into lists. The identity function is the key in the hash and the basis for equality, so the keys are colors, and the values are lists of colors.

Then update the hash values by filtering through the length function.

by esfandiaon 11/10/2015, 7:51:42 AM
Would be interesting doing the same exercise for Java as well. We've been through Hashtable, HashMap, basic for loops, Enumeration, Iterator, foreach loops, generics, and now we have lambdas.

by schmidtcon 11/10/2015, 5:29:20 AM

just for fun...

  import itertools
  colors = ["brown", "red", "green", "yellow", "yellow", "brown", "brown", "black"]
  dict([(color, len(list(grp))) for color, grp in itertools.groupby(sorted(colors))])

  dict([(color, len(filter(lambda c: c==color, colors))) for color in set(colors)])

...because sometimes job security is important too.

by dec0dedab0deon 11/10/2015, 8:26:07 AM
I thought this was going to end around 2.4 with the .count(c) method, since you don't really need the dict at that point.
by d0mineon 11/10/2015, 4:46:45 AM
collections.Counter() was the slowest option according to the benchmark from 2010 http://stackoverflow.com/questions/2522152/python-is-a-dicti...
by kendallchuangon 11/10/2015, 5:14:02 PM
Great blog post Trey!
by mmyteston 11/10/2015, 4:46:40 AM
very cool
by houselyncon 11/10/2015, 5:50:12 AM
I like History Thank for history information!!!