This idea has basically already been implemented (even before this post was published) in a construction called mixture of experts. It makes it easier to train, while still making the model somewhat interconnected.
This idea has basically already been implemented (even before this post was published) in a construction called mixture of experts. It makes it easier to train, while still making the model somewhat interconnected.