Arrows of Time for Large Language Models

  • Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?

  • There is a link with entropy creation?