It would have been nice to see if the changes actually were noticeable or not. Yes, one version may be faster than another, but simply ordering a set of tests like that doesn't show if it is worth the trouble doing anything about it.
> All tests were run on AWS from an m3.medium EC2 instance
so you're comparing running times measured on a shared host? This is generally not considered a best practice if you want meaningful numbers.
Unsurprised at the superior performance of GCC, but I am surprised that ruby ships with -O3. Why would they choose that optimization level?
This ranking uses a method called Borda Count (https://en.wikipedia.org/wiki/Borda_count). It can lead to quite arbitrary results, for a number of reasons. One example is that being 9th is three times better than being 11th, whereas being first is only marginally better (relatively spoken) than being third.
Better methods are readily available, for example Schulze's (https://en.wikipedia.org/wiki/Schulze_method). I wonder how much these rankings would change...
One thing I'm wondering about is whether running the tests on FreeBSD 10.1 would make a difference.
The core FBSD 10 system is compiled with clang. Since Ruby uses system libraries, the question is if a clang vs. gcc compiled Ruby runtime would produce different results in a clang-compiled environment vs gcc-compiled system.
Hard to know how it would matter, but it does seem conceivable that it might.
Would be interesting to see if Os improves performances on O2, especially for 4.9
What compiler optimisation levels were used for Clang?
Doing the comparison as a ranking is bad, as the result can change if you add or remove compilers. For example, with two tests T and U and four compilers C, D, E and F:
T: C D E F
U: D E F C
Looking at all four, D is better than C. If you hadn't looked at E and F, the two would tie.
Similar tests (and results) for Postgres http://blog.pgaddict.com/posts/compiler-optimization-vs-post...