FLUX.1 Kontext

  • Currently am testing this out (using the Replicate endpoint: https://replicate.com/black-forest-labs/flux-kontext-pro). Replicate also hosts "apps" with examples using FLUX Kontext for some common use cases of image editing: https://replicate.com/flux-kontext-apps

    It's pretty good: quality of the generated images is similar to that of GPT-4o image generation if you were using it for simple image-to-image generations. Generation is speedy at about ~4 seconds per generation.

    Prompt engineering outside of the examples used on this page is a little fussy and I suspect will evolve over time. Changing styles or specific aspects does indeed work, but the more specific you get, the more it tends to ignore the specifics.

  • I tried this out and a hilarious "context-slip" happened:

    https://imgur.com/a/gT6iuV1

    It generated (via a prompt) an image of a space ship landing on a remote planet.

    I asked an edit, "The ship itself should be more colourful and a larger part of the image".

    And it replaced the space-ship with a container vessel.

    It had the chat history, it should have understood I still wanted a space-ship, but it dropped the relevant context for what I was trying to achieve.

  • Some of these samples are rather cherry picked. Has anyone actually tried the professional headshot app of the "Kontext Apps"?

    https://replicate.com/flux-kontext-apps

    I've thrown half a dozen pictures of myself at it and it just completely replaced me with somebody else. To be fair, the final headshot does look very professional.

  • I'm debating whether to add the FLUX Kontext model to my GenAI image comparison site. The Max variant of the model definitely scores higher in prompt adherence nearly doubling Flux 1.dev score but still falling short of OpenAI's gpt-image-1 which (visual fidelity aside) is sitting at the top of the leaderboard.

    I liked keeping Flux 1.D around just to have a nice baseline for local GenAI capabilities.

    https://genai-showdown.specr.net

    Incidentally, we did add the newest release of Hunyuan's Image 2.0 model but as expected of a real-time model it scores rather poorly.

    EDIT: In fairness to Black Forest Labs this model definitely seems to be more focused on editing capabilities to refine and iterate on existing images rather than on strict text-to-image creation.

  • Technical report here for those curious: https://cdn.sanity.io/files/gsvmb6gz/production/880b07220899...

  • Is input restricted to a single image? If you could use more images as input, you could do prompts like "Place the item in image A inside image B" (e.g. "put the character of image A in the scenery of image B"), etc.

  • So here is my understanding of current native image generation scenario, I might be wrong so please correct me, I'm still learning it and I'd appreaciate the help.

    First time native image gen was introduced in Gemini 1.5 Flash if I'm not wrong, and then OpenAI was released for 4o which took over the internet by Ghibli Art.

    We have been getting good quality images from almost all image generators like Midjourney, OpenAI and other providers, but the thing that made it special was true "multimodal" nature of it. Here's what I mean

    When you used to ask chatgpt to create an image, it will rephrase that prompt and internally send that prompt to Dalle, similarly gemini would send it to Imagen which were diffusion models and they had little to know context in your next response about what's there in the previous image

    In native image generation, it understands Audio, Text and even Image tokens in the same model and need not to rely on diffusion models internally, I don't think both Openai and google has released how they've trained it but my guess is that it's partially auto-regressive and diffusion but not sure about it

  • How knowledgable do you need to be to tweak and train this locally ?

    I spent two days trying to train a LoRa customization on top of Flux 1 dev on Windows with my RTX 4090 but can’t make it work and I don’t know how deep into this topic and python library I need to study. Are there scripts kiddies in this game or only experts ?

  • Pretty good!

    I like that they are testing face and scene coherence with iterated edits -- major pain point for 4o and other models.

  • Hopefully they list this on HuggingFace for the opensource community. It looks like a great model!

  • > show me a closeup of…

    Investigators will love this for “enhance”. ;)

  • Don’t understand the remove from face example. Without other pictures showing the persons face, it’s just using some stereotypical image, no?

  • Anyone have a guess as to when the open dev version gets released? More like a week or a month or two I wonder.

  • Can it generate chess? https://manifold.markets/Hazel/an-ai-model-will-successfully...

  • I vibed up a little chat interface https://kontext-chat.vercel.app/

  • It still has no idea what a Model F keyboard looks like. I tried prompts and editing, and got things that weren't even close.

  • You can try it now on https://fluxk.art.

  • I'm trying to login to evaluate this but the google auth redirects me back to localhost:3000

  • Nobody tested that page on mobile.

  • I wonder if this is using a foundation model or a fine tuned one

  • Do they opensource their model structure?

  • is it possible to provide mask? haven't hard time changing specific area

  • [dead]

  • [dead]

  • [dead]

  • [dead]

  • [dead]

  • Single shot LORA effect if it works as they cherry pick will be a game changer for editing.

    As with almost any AI release though, unless it’s open weights, I don’t care. The strengths and weaknesses of these models are apparent when you run them locally.