Transformers Can Do Bayesian Inference

  • Isn’t that what logistic activation functions are all about?