For MNIST your model is sufficient. But itโs not structurally complex enough to generate more complex images, even if you scale it up.
[flagged]
Have you employed a dozen ethicists to go through the input and output to make sure it can't say any slurs? [EDIT] That's why small ones don't exist
But the images your network generates look nothing like MNIST digits. There is also no variance. For example, all 9s are identical.