Hacker News

Show HN: An open source framework for voice assistants

by kwindlaon 5/13/2024, 5:21:14 PM with 10 comments

by awenixon 5/13/2024, 6:59:16 PM
Nice to see an open source implementation, i have been seeing many startups get into this space like https://www.retellai.com/, https://fixie.ai/ etc. They always end up needing speech-to-speech models (current approach seems speech-text-text-speech with multiple agents handling 1 listening + 1 speaking), excited to see how this plays with recently announced gpt-4o
by ilakshon 5/13/2024, 6:12:35 PM
This is great but we really need an audio-to-audio model like they demoed in the open source world. Does anyone know of anything like that?
Edit: someone found one: https://news.ycombinator.com/item?id=40346992
by johnmaguireon 5/13/2024, 6:17:57 PM
Siri came out in October 2011. Amazon Alexa made its debut in November 2014. Google Assistant's voice-activated speakers were released in May 2016.
From what I can tell, Siri is still a dumpster fire that nobody is willing to use. And I have no personal experience with Alexa, so I can't speak to it. But I do have a few Google Home speakers and an Android phone, and I have seen no major improvements in years. In fact, it has gotten worse - for example, you can no longer add items directly to AnyList[0], only Google Keep.
Or, as an incredibly simple example of something I thought we'd get a long time ago, it's still unable to interpret two-part requests, e.g. "please repeat that but louder," or "please turn off the kitchen and dining room lights."
I find voice assistants very useful - especially when driving, lying in bed, cooking, or when I'm otherwise preoccupied. Yet they have stagnated almost since their debut. I can only imagine nobody has found a viable way to monetize them.
What will it take to get a better voice assistant for consumers? Willow[1] doesn't seem to have taken off.
[0] https://help.anylist.com/articles/google-assistant-overview/
[1] https://heywillow.io/
edit: I realize I hijacked your thread to dump something that's been on my mind lately. Pipecat looks really cool, and I hope it takes off! I hope to get some time to experiment this weekend.
by userhackeron 5/14/2024, 3:59:37 AM
Just made https://feycher.com thats similar, but has realtime lip syncing as well. Let me know if you are interested and we can chat
by xan_ps007on 5/13/2024, 7:24:59 PM
We're also building bolna an open source voice orchestration: https://github.com/bolna-ai/bolna
by russon 5/13/2024, 7:49:17 PM
LiveKit Agents, which OpenAI uses in voice mode is also open source:
https://github.com/livekit/agents
by orliesauruson 5/14/2024, 3:11:10 AM
The whole VAD thing is very interesting, keen to learn more about how it works and especially with multiple speakers!
by canadiantimon 5/13/2024, 6:02:11 PM
Very cool, great work! I can def self using this when I start building in that direction.
by 35mmon 5/14/2024, 2:30:22 PM
How would I go about using this to live translate phone calls?
by bamazizion 5/13/2024, 6:10:00 PM
I wonder how the just announced "GPT-4o" with real-time voice impacts projects like this?
The demo on real-time multi language translation conversation blew me away!