This is a really cool blogpost - that you can influence a model with guidance towards a desired outcome feels like a really valuable feature. I'm currently working on a project that is generating stable diffusion images and this kind of technique would be so useful ;-)
I'm curious what initial results you got when training only on 6 images. Were generated images not Wildsmith-ish enough?
It's pretty wild how much value Automatic1111 webuid enabled to create.
I thought this was cool because it's very different from the usual anime crap
And you can see how smoothly varying the math produces these varying images in response to the prompt
I wrote this post because I wanted to know what effect one of the main latent diffusion model parameters -- the classifier guidance scale (cfg_scale) -- was having on the sampling process.
As well as smoothly varying the cfg_scale, I think it would be fascinating to do some mechanistic interpretability on latent diffusion models. Something like the microscope tool OpenAI used to have for convnets: https://microscope.openai.com/models