This reveals one weakness of the Windows development model: if something isn’t a feature that is driven with a PM behind it, it won’t happen. On the other hand, if some obscure internal thing isn’t optimal yet, you can bet some obsessed hacker is going to tackle it one day. How many schedulers has Linux had already?
From my experience for a long time the Windows NT (from Win2K onwards) kernel and scheduler were actually better than Linux in several ways. That always amazed me, because Linux was a better server OS in many other ways.
Now at 64 cores and above it is clear that the Linux developers have spent a lot of time making the Linux kernel better. May have something to do with the fact that a big proportion of servers in production with many CPUs/cores are Linux servers so they started investing in this quite early?
Honestly a little surprised it's as close as it is. I have consistently hated having to deploy anything that requires lots of cores on a windows machine.
I have been keeping on eye on DragonflyBSD for years now, it does some very interesting things, so this:
> Coming up next I will be looking at the FreeBSD / DragonFlyBSD performance on the Threadripper 3990X
has me excited.
I'd like to see comparisons of compilation time. I wish there was a standard for benchmarking CPUs by compilation time. I know quite often a compilation of the Firefox source code is used, as well as the Linux kernel, I just wish it was more prevalent in these reviews.
The linux kernel has been run on "big iron" for a long time now, it would be surprising if it weren't better prepared for scaling to 128+ cores.
linux/Documentation/vm/numa.rst states it was started in 1999, was windows going anywhere near NUMA architectures back then?
These are all embarrassingly parallel multiplication workloads. Would be nice for a change if anyone would run something like MySQL or a gRPC server or something like that, you know one where it actually makes a difference how threads get scheduled when they go to sleep and wake up and when packets arrive and so forth.
With no clear explanation of wildly varying results between different benchmarks, I wonder if the the analysis is flawed.
Were those programs built with the same toolchain? Could it be, that some library the lagging ones use is causing the problem?
looking at the results make me wonder if MS is keeping separate branches of win 10 internally or some CPU hogging services are disabled on Win 10 enterprise version.
So, the get max pref from the Windows kernel, the software should use the completion port API, and not regular threads/locks.
https://docs.microsoft.com/en-us/windows/win32/fileio/i-o-co...
However, any software that does that will likely NOT be cross platform.
In addition, if you want to benchmark the kernel, you should run against ram disk and not SSD.
They're using `Clear Linux 32280` which is a distro produced by Intel.
Presumably built using the Intel compiler which specifically penalizes using AMD CPUs.
Would explain the advantage at low core counts that windows has.
There is no mention in the article whether the software suite was vetted for support of more than 64 threads on Win32. The API has a peculiar weakness that limits thread scheduling to a single processor group by default and a group can have no more than 64 hardware threads. To get above this limit, the application must explicitly adjust the processor affinity of its threads to include the additional hardware threads. MS was not in a hurry to adjust the C++ STL and their OpenMP runtime after the basic processor group API appeared in Vista. I am not sure if they managed to do it by now. Some of the benchmark results look to me like the missing scaling from 64 to 128 hardware threads on Windows might be caused by this.