Hacker News

Real Time Speech Recognition with Gradio

by ak391on 3/30/2022, 1:40:20 AM with 6 comments

by fxtentacleon 3/30/2022, 10:21:16 AM
The title is very misleading. This is a thin 10-line Gradio GUI in front of the Huggingface Pipeline API, the latter of which will download 1000+ python files, a professionally pre-trained 1GB asr model, and a 500MB language model. But to all of that, Gradio isn't contributing. They are merely the GUI framework.
"Gradio GUI Python Package is compatible with Huggingface Inference Python Package"
Yeah, duh.
Also, I'm surprised that they chose Mozilla DeepSpeach which was last updated in 2020 instead of wav2vec2 which is actually competitive in recognition quality.
EDIT: BTW if you're curious, you can try out many of the Huggingface pre-trained models here:
https://huggingface.co/spaces/huggingface/hf-speech-bench
and for example here's a Facebook pre-trained English model with good performance that you can easily embed into your own Python apps. [Use in Transformers] button at the top right of the page.
https://huggingface.co/facebook/wav2vec2-base-960h
by spullaraon 3/30/2022, 4:48:56 AM
Wow, this is really, really bad. Try this one to compare.
https://azure.microsoft.com/en-us/services/cognitive-service...
I don't work for MSFT.
by renierbothaon 3/30/2022, 6:41:40 AM
Yeah not really working as expected.
Was interested in this as I'm looking to build a "swearing detector" to help me swear less in video calls but this could not pick up one sentence properly out of a couple and then it started throwing errors.
Think it needs some time back in the lab tbh.
by jamal-kumaron 3/30/2022, 3:07:32 PM
TUSTING TUSTING TESTTEST ONE TO TEST TEST ONE DO ONE TWO HREE FOUR OUD FIVE SIX SEVEN EIGHT
I wouldn't exactly call this a success
by thomasfromcdnjson 3/30/2022, 11:54:17 AM
Really nice work on the GUI, keep it up!
by monkeyduston 3/30/2022, 10:49:16 AM
ASR on my Pixel 6 has been a game changer, combo of accuracy and speed.