Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?
There is a link with entropy creation?
Isn't it obvious that since LLM are trained to predict the next word they do better than to predict the previous one?