They have known for a long time that text completion is what is called 'AI-complete' meaning that if you have full AGI then it can do human level text completion and if you have human level text completion then it can do full AGI. So they found a way, using an obscene number of model parameters and obscene compute power and obscene dataset size, to get really really good at text completion. So now they got these systems that, looking back, they are going to call just AGI. So in simpler words, it works because the computers brains got so big that they are now conscious like you and me.
I would love to hear everyone's input on this question as well!
> Are there papers that peruse what kind of concepts the model is actually building/learning in those heads and layers?
> There are large teams who spend months tuning those models. Do those teams have access to those internal concepts that the model built up and organized? Is any of this work public?
See: https://openai.com/research/language-models-can-explain-neur...
My understanding: Generally, the models are compressing their understanding of all text, and in doing so, they're learning high order concepts that allow their compression of all the text they were fed during pre-training to be a better compression - more compressed, and less loss.