In a similar vein, Andrew Kelly, the creator of Zig, gave a nice talk about how to make use of the different speeds of different CPU operations in designing programs: Practical Data-Oriented Design https://vimeo.com/649009599
In case you are wondering about your cache-line size on a Linux box, you can find it in sysfs.. something like..
cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
Wait wait wait.
M2 processors have 128 byte wide cache lines?? That's a big deal. We've been at 64 bytes since what, the Pentium?
Something I've experienced first hand. Programming the ps3 forced you to manually do what CPU caches does in the background, which is why the ps3 was a pain in the butt for programmers who were so used to object-oriented style programming.
It forced you to think in terms of: [array of input data -> operation -> array of intermediate data -> operation -> array of final output data]
Our OOP game engine had to transform their OOP data to array of input data before feeding it into operation, basically a lot of unnecessary memory copies. We had to break objects into "operations", which was not intuitive. But, that got rid a lot of memory copies. Only then we managed to get decent performance.
The good thing, by doing this we also get automatic performance increase on the xbox360 because we were consciously ? unconsciously ? optimizing for cache usage.
I learned so much from this blog and from the discussion. HN is so awesome. +1 for learning about lacpu -C here.
A while back I had to create a high speed steaming data processor (not a spark cluster and similar creatures), but a c program that could sit in-line in a high speed data stream and match specific patterns and take actions based on the type of pattern that hit. As part of optimizing for speed and throughput a colleague and I did an obnoxious level of experimentation with read sizes (slurps of data) to minimize io wait queues and memory pressure. Being aligned with the cache-line size, either 1x or 2x was the winner. Good low level close to the hardware c fun for sure.
I think cache coherency protocols are less intuitive and less talked about when people discuss about caching, so it would be nice to have some discussion on that too.
But otherwise this is a good general overview of how caching is useful.
Great article. I have always had an open question in my mind about struct alignment and this explained it very succinctly.
Really cool stuff and a nice introduction but curious how much modern compilers do for you already. Especially if you shift to the JIT world - what ends up being the difference between code where people optimize for this vs write in a style optimized around code readability/reuse/etc.
"On the other hand, data coming from main memory cannot be assumed to be sequential and the data cache implementation will try to only fetch the data that was asked for."
Not correct. Prefetching has been around for a while, and rather important in optimization.
Why is the natural alignment of structs equal to the size of their largest member?
Super interesting. Thank you!
Drepper's "What Every Programmer Should Know About Memory" [1] is a good resource on a similar topic. Not so long ago, there was an analysis done on it in a series of blog posts [2] from a more modern perspective.
[1] https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
[2] https://samueleresca.net/analysis-of-what-every-programmer-s...