Hacker News

Show HN: Open source framework OpenAI uses for Advanced Voice

by russon 10/4/2024, 5:01:04 PM with 12 comments

by racecar789on 10/5/2024, 12:48:29 PM
Imagine being able to tell an app to call the IRS during the day, endure the on-hold wait times, then ask the question to the IRS rep and log the answer. Then deliver the answer when you get home.
Or, have the app call a pharmacy every month to refill prescriptions. For some drugs, the pharmacy requires a manual phone call to refill which gets very annoying.
So many use cases for this.
by throw14082020on 10/5/2024, 11:22:51 AM
This is really helpful, thanks!
OpenAI hired the ex fractional CTO of LiveKit, who created Pion, a popular WebRTC library/tool.
I'd expect OpenAI to migrate off of LiveKit within 6 months. LiveKit is too expensive. Also, WebRTC is hard, and OpenAI now being a less open company will want to keep improvements to itself.
Not affiliated with any competitors, but I did work at a PaaS company similar to LiveKit but used Websockets instead.
by pj_mukhon 10/4/2024, 10:21:35 PM
Super cool! Didn't realize OpenAI is just using LiveKit.
Does the pricing breakdown to be the same as having a OpenAI Advanced Voice socket open the whole time? It's like $9/hr!
It would be theoretically cheaper to use this without keeping the advanced voice socket open the whole time and just use the GPT4o streaming service [1] for whenever inference is needed (pay per token) and use livekits other components to do the rest (TTS, VAD etc.).
What's the trade off here?
[1]: https://platform.openai.com/docs/api-reference/streaming
by solarkrafton 10/5/2024, 12:59:25 AM
That’s some crazy marketing for a „our library happened to support this relatively simple use case“ situation. Impressive!
By the way: The cerebras voice demo also uses LiveKit for this: https://cerebras.vercel.app/
by FanaHOVAon 10/4/2024, 9:27:13 PM
Olivier, Michelle, and Romain gave you guys a shoutout like 3 times in our DevDay recap podcast if you need more testimonial quotes :) https://www.latent.space/p/devday-2024
by spuzon 10/5/2024, 8:46:47 AM
Is there anyone besides OpenAI working on a speech to speech model? I find it incredibly useful and it's the sole reason that I pay for their service but I do find it very limited. I'd be interested to know if any other groups are doing research on voice models.
by mycallon 10/4/2024, 9:10:14 PM
I wonder when Azure OpenAI will get this.
by 0x1ceb00daon 10/5/2024, 11:50:02 AM
This suggests that the AI "brain" receives the user input as text prompt (agent relays the speech prompt to GPT-4o) and generates audio as output (GPT-4o streams speech packets back to the agent).
But when I asked advanced voice mode it said the exact opposite. That it receives input as audio and generates text as output.
by gastonmorixeon 10/4/2024, 9:10:39 PM
Nice they have many partners on this. I see Azure as well.
There is a common consensus that the new Realtime API is not actually using the same Advanced Voice model / engine - or however it works - since at least the TTS part doesn’t seem to be as capable as the one shipped with the official OpenAI app.
Any idea on this?
Source: https://github.com/openai/openai-realtime-api-beta/issues/2
by lolpandaon 10/5/2024, 6:47:56 AM
so the WebRTC helps with the unreliable network between the mobile clients and the server side. if the application is backend only, would it make sense to use WebRTC or should I go directly to realtime api?
by willsmith72on 10/4/2024, 10:44:16 PM
That was cool, but got up to $1 usage real quick