Show HN: Real-time voice chat with AI, no transcription

  • I'm building various prototypes for VR training simulations using Inworld. But they also use the cascaded approach. Also, I am building customer service agent product which we would love to add voice to but whisper and eleven labs (and others) are just too slow. Is tincan available via API?

  • Very cool. If I ask to deduce the gender of my voice, can it do that? Training a projection layer makes sense, but ultimately you'd want to output audio conditioned on the input rather than text. Is there a way to train a reverse projection with some kind of skip connections to take audio input into account? Or an end to end audio model?

  • Very cool! How is this differentiated from ChatGPT voice?

  • Very cool!!! I had this idea a while. Is the conversational part of the dataset open?