Mind the Trust Gap: Fast, Private Local-to-Cloud LLM Chat

  • A fast Trusted Execution Environment protocol utilizing the H100 confidential mode. Prompts are decrypted and processed in the GPU enclave. Key innovation is that it can be really fast especially on ≥10B parameter models, with latency overhead less than 1%. Like in CPU confidential cloud computing, that opens up a channel for communication with cloud GenAI models that even the provider cannot intercept. Wonder whether something like that could boost the trust for all the AI neoclouds out there.

  • author here!