DiffRhythm: Fast End-to-End Full-Length Song Generation with Latent Diffusion

  • This is primarily architecturally interesting in my opinion. Output songs have unusual noticeable artifacts, and I would guess they become more noticeable the more you listen.

    That said, wow. An end to end FAST architecture that can infer a 4.5 minute song in 10 seconds is a compelling thing. I didn’t see if we got open weights, but my guess is that this is not crazy challenging to train, and some v2/v3 versions of this are likely to be good-to-very-good.

  • The style matching is interesting, but there's no song structure. There's no identifiable chorus in any of the demo songs.

  • If I am to retain any interest as an amateur music writer without proaudio engineering skills and equipment, but with a day job, , I want tools that help me enact MY vision to reality. That means multi tracking, ability to hum or score a melody and have it transfer to musical instrument, ability to enter existing tracks, provide a temporal segment for diffusion, and ask it to 'generate a counterpoint to the melody with strings, etc. The most exciting possibilities of this is enabling talented writers with day jobs, not one click song writing.

  • Goodness, the music that is produced has almost no discernible time signature. I don't know if my brain is faulty, but I find it extremely annoying to listen to.

  • Cool. Obviously needs some work. Lots of artifacts. Something to build on though.

    Lots of sour grapes comments from folks. Too bad. Not what I expect out of Hacker News. Glad people are pushing the technological envelope and exploring this space despite the strong negative emotions.

  • the "prompt" is the 10 seconds original audio file + the lyrics, right?

    absolutely crazy

  • [flagged]

  • [dead]

  • [flagged]

  • None of this is music. It is noise that sounds likes music. Pretty analogous to how AI slop is not information, but just words that are arranged to look like information.

  • Business hates creatives. They'll do anything to automate us away.

  • It’s just combining sample WAV files without human coordination, talk about a lame-ass achievement. It’s already easy enough to set BPM and load in files in Ableton and warp them into unison, from what I heard this is basically just that with”HOORAY FOR AI” slathered as a veneer on top.

    If you think I’m being harsh, I have my reasons as a professional musician to critique these things in an unflattering light because they are my competition. Thankfully actually “generated” AI music is trash. Copyright is problematic in the US, I admit, but tech bros using copyrighted material to train programs to put us out of business - without paying a penny which even Spotify doesn’t per stream - yeah, I’ll have some disdain about this scenario and I feel it’s justified.

    Just because you can doesn’t mean you should.

  • One thing that strikes me about almost every AI-generated track (from academic or commercial generators), is that even if it's often "competent" - in that it has reasonable melodies, chord progressions, etc - is how average it is. Mediocre, taking the term literally. In a way that also highlights cliches and crutches that are common in human-made music. Somewhat reminiscent of GPT text that drones on and on in a grammatically correct way but conveys little of interest. This is of course not unexpected, given how these models are trained. I wonder if this will have an effect of pushing (human) musicians to be more experimental - to move away from the conventions that are now just a click away for anyone.