http://www.josesotelo.com/speechsynthesis/files/wav/blizzard...
I have not laughed this hard in a long time.
the demo page isn't clearly presented.
for example, on this page, only spanish has the char2wav label.
http://www.josesotelo.com/speechsynthesis/
It's unclear which results are the output of the model.
Many of the synth voices sound to my ear very similar to people who are either drunk or have a brain injury. I'm not complaining, it's an interesting parallel.
So how does this work? It's not very clear from the article.
Proceeding to feed this Paul Bettany's "Jarvis" from some movies...
I’d like to feed The Chaos to it and see how it fares.
I was just looking today for some open source TTS engines. I found this https://github.com/CSTR-Edinburgh/merlin and this http://mary.dfki.de/ Are you guys aware of any better ones? What about open source speech recognition, is there a clear leader?
I am getting tired of people implementing "deep learning to convert foo into bar" and staking a claim on the name "foo2bar".
It leads to "AI hallucination", where even if "foo2bar" doesn't work, people assume that it's the one right AI for turning foo into bar. When someone gets better at turning foo into bar, the typical response will be "is that just foo2bar?"
This happened absurdly backwards with doc2vec, which after word2vec everyone talked about as if it were a real thing, until Radim Řehůřek finally made a reasonable implementation of it under that name.