Large Language models are pre-trained by creators on the huge data.
In many cases you do not need to do anything with LLM and you can just use it.
If they were not trained on the data that contains information that you are interested then you can use technique called RAG (Retrieval-Augmented Generation).
You also can do fine-tuning which is kind of training but on small amount of data.
I think what you are referring to is the concept of “finetuning”. You use a pretrained network and add a (relatively) small set of new input-output pairs to steer it in a new direction.
It's widely used, you can look it up.
A more challenging idea is whether it is possible to reuse the pretrained weights when training a network with a different architecture (maybe a bigger transformer with more heads, or something).
AFAIK this is not common practice, if you change the architecture you have to retrain from scratch. But given the cost of these trainings, I wouldn't be surprised if OpenAI&co had developed some technique to do this, eg across GPT versions..