Generalist AI doesn't scale (2024)

  • This idea has basically already been implemented (even before this post was published) in a construction called mixture of experts. It makes it easier to train, while still making the model somewhat interconnected.