As for what it is: it is a multimodal LLM that can accept both text and images, and generate both text and images as output.
Note: this model was just released from DeepSeek. https://github.com/deepseek-ai/Janus
Later discussion: https://news.ycombinator.com/item?id=42843131
Anybody used this yet and can share example outputs?
As for what it is: it is a multimodal LLM that can accept both text and images, and generate both text and images as output.