> After running benchmarks with all the variants and planets, the improvement is about 9% to 12%.
Pretty weak speedup, maybe a straight up n-body implementation would see closer to the 8x theoretical speedup.
> AVX functions start with _mm256_
I don't know anything about Rust, but a nicer word is probably "intrinsics". They usually compile to a single instruction.
This looks kinda gross to me. Do the rust developers not want to emulate what ipsc and cuda do? Writing intrinsics by hand is not what I expect from a 2019 language.
You may enjoy my video tutorial on SIMD Intrinsics as well:
https://www.youtube.com/watch?v=4Gs_CA_vm3o
I also use Rust but its perfectly fine for learning about intrinsics in C/C++ or .NET as well. I cover some of the fundamental strategies for using them well, how to lay out data in memory, how to deal with branches, etc.