Stable Diffusion launch announcement

  • I've been quite excited for this

    I had a fair bit of fun with DALL-E, but it's very expensive and I found the product too much like a toy - a pair of plastic scissors made for children. Many times have I got my prompts blocked for apparent "scunthorpe" style filtering.

    Also the creativity is a bit muted, and I'm convinced anything an American would describe as un-christian has been purged from its training data leaving an air of vapidity - I found myself wasting many prompts trying to get it to generate vomit for example.

    And of course, the ironically named "OpenAI" being beaten to the punch for actually releasing something that isn't just an API.

  • "Stable diffusion runs on under 10 GB of VRAM on consumer GPUs, generating images at 512x512 pixels in a few seconds. This will allow both researchers and soon the public to run this under a range of conditions, democratizing image generation."

    Oooooh I can actually run this at home!

  • Super excited about this. A tool that's comparable or even outperforms DALL-E 2, but open to the public, open source, and without the insane content restrictions or draconian banning policies.

  • The most interesting thing is that the model is relatively small compared to Imagen, Dalle 2 and Parti. They have specially trained this model so that people can easily use it on their GPUs. I think StabilityAI will train a larger version of Stable Diffusion, perhaps with a larger text encoder, as the one used in this model is quite small and I think that is the biggest bottleneck in this model; Imagen shows how scaling the text encoder is actually more important than scaling the generator. In the end, the architecture of the model is not very different from LDM-400m that CompVis had already trained, but it is conditioned on CLIP text embeddings instead of text tokens, they trained an autoencoder from 512 instead of 256, and of course Stable Diffusion was trained for much more.

  • The fact that they're releasing the code and weights is so insanely cool. Love enriching the public goods.

  • The outputs from this model are super impressive... Would be cool for it to be usable with Google Colab soon!

    Unrelated, but multimodal.art has been doing very cool work on building a whole little app you can run from a colab. But their models are pretty underwhelming at the moment.

  • I'm getting a 503 error right now, but here is a cache: http://webcache.googleusercontent.com/search?q=cache:https:/...

  • MidJourney is fantastic when it comes to creating AI art. DALL-E is better at some things (it seems to understand depth better, can draw hands, better at cartoon characters), but Stability looks fantastic and I'm really excited to try it.

    I think I heard the Stable Diffusion folks call DALL-E the McDonald's of AI art, and based on my experience, I agree.

  • For someone only tangentially familiar with this space, how is this different than e.g. https://github.com/nerdyrodent/VQGAN-CLIP which you can also run at home? Is it the quality of the generated images?

  • I've been looking for an ML model to try hosting/terraforming out on GCP that could conceivably recoup the costs of hosting. I think this might be it.