Hacker News

SeamlessM4T, a Multimodal AI Model for Speech and Text Translation

by mchiangon 8/22/2023, 1:58:50 PM with 13 comments

by lhlon 8/22/2023, 5:41:10 PM
I gave it a spin a little bit ago. Per usual, install docs didn't quite work OOTB, here's how I got it working: https://llm-tracker.info/books/howto-guides/page/speech-to-t...
One limitation that seems undocumented, the current code only supports relatively short clips so isn't suitable for long transcriptions:
> ValueError: The input sequence length must be less than or equal to the maximum sequence length (4096), but is 99945 instead.
by crakenzakon 8/22/2023, 4:31:29 PM
code: https://github.com/facebookresearch/seamless_communication
paper: https://ai.meta.com/research/publications/seamless-m4t/
demo: https://seamless.metademolab.com/
by 0cf8612b2e1eon 8/22/2023, 5:33:43 PM
Will there be a whispercpp equivalent? Half the reason I love whisper is how dead simple it is to get running. I will take somewhat lower accuracy for easier operation.
Edit: unless there is native speaker diarization. That would be a huge value add.
by msp26on 8/22/2023, 6:16:42 PM
All I want is llama-2-34b (seriously what's taking so long on this specific model) but this is interesting too I guess.
by rvzon 8/22/2023, 6:47:43 PM
Yet somehow, many here underestimated Meta’s position in AI and proclaimed that Meta was dying and was not important and far behind in the AI race.
How things change dramatically in one year with such exaggeration of Meta’s collapse in 2022.
Not only they are in the lead in $0 free AI models, they are also at the finish line in the AI race to zero.
by jimmieson 8/22/2023, 6:11:11 PM
Lol, they botched the first example - that it translates “Our goal is to create a more connected world” to Vietnamese: It has a glancing typo at the end of the sentence “hơn” instead of “hơ.” Also it really messed up the pronounciation: It read “Chúng tôi” as “Chúng ta” - they are totally different words phonetically. The pronunciation also sounds like it’s made by someone who is mentally sick. So they botched in both translation and pronunciation.
That’s so embarrassing - especially for something to show how good their stuff is (although I think it’s probably not the ai’s fault) - just shows how sloppy their people are.
I know they have plenty of Vietnamese engineers there. Did the PR dept just throw this final version of the video out without reviewing with them?
by houseatrielahon 8/22/2023, 5:26:06 PM
SeamlessM4T-Medium { 1.2B params, filesize 6.8 GB }. Wondering how it compares to OpenAi's Whisper.
by gigel82on 8/22/2023, 5:49:49 PM
The speech recognition in their demo is very very bad (~60% in my empirical test, vs. 95% with WhisperCPP). The translation is also very inaccurate.
That said, I fully support open releases and look forward to future versions and improvements.
by Havocon 8/23/2023, 9:03:24 AM
Disappointing license. Here's a useful thing, but be sure to not use it for the majority of use cases
by Jayakumarkon 8/22/2023, 5:32:32 PM
Meta is killing it with this open models. Not sure why Tamil Language is missing on Output.
by villgaxon 8/23/2023, 7:48:02 AM
Non-commercial as per frickin usual
by jacooperon 8/22/2023, 5:28:00 PM
What's the license
by 1atticeon 8/22/2023, 10:46:46 PM
....'M4T', ahem, might mean slightly more than you think it does