Load-Store Conflicts

  • I find Clang generally a bit too eager to combine loads. This is especially bad when returning structs through the stack; you typically write them piecemeal in some function, return, and then the caller often wants to copy it from the stack into somewhere else, which it does with SIMD loads/stores.

    This is a significant problem on AMD; Intel and Apple seems to be better.

  • A very interesting article that goes deeper into store-to-load forwarding than anything I’ve read before.