Ask HN: Why we need complex text-to-image networks if this simple one works?

  • But the images your network generates look nothing like MNIST digits. There is also no variance. For example, all 9s are identical.

  • For MNIST your model is sufficient. But itโ€™s not structurally complex enough to generate more complex images, even if you scale it up.

  • [flagged]

  • Have you employed a dozen ethicists to go through the input and output to make sure it can't say any slurs? [EDIT] That's why small ones don't exist