the session is tied to a gpu cluster. It would actually be quite inefficient to switch gpu cluster to another one mid session, but its needed in a failure scenario
good batching and tensor parallelization prob
the session is tied to a gpu cluster. It would actually be quite inefficient to switch gpu cluster to another one mid session, but its needed in a failure scenario