Dynamic Automatic Differentiation of GPU Broadcast Kernels [pdf]

  • Author here; the arxiv version can be found at https://arxiv.org/abs/1810.08297. Not much different from OP's linked version, but it includes citations to other interesting Julia AD/TPU-related papers that utilize this technique.

    Happy to answer any questions, at least until I turn in for the night :)