Sounds like an extremely complex technical problem. I also suggest to look at the use cases when this is needed. One of the problems is that loading weights into the GPU will be so slow that it will be really hard to share the GPU between different processes - causing long time to offload and load. Would love to learn more about what you do.
Yes, I have wanted something like this for a while. I try to avoid using gpus where possible because of the expense, and the ephemeral nature of my use.
Depending on the price difference from a standard GPU instance, absolutely.
Isn’t this just the cloud? You pay for what you use
Yes, I would use it if the price is affordable.
Maybe prefix with "Ask HN:"?
Depends on your workload...
For anyone curious, here is an early prototype of this tech in action:
https://imgur.com/a/2qPN4ru
Would love to hear your thoughts on how we can make this most useful for you!