Show HN: First large scale evaluation of 4o Image Generation from OpenAI

  • I thought OpenAI had been sleeping a bit and given up on the image generation race. However, with the recent release of the oddly named 4o Image Generation model (Why not continue with the DALL-E naming scheme? I personally love the pun) it looks like they cooked up one hell of a model.

    If you are into visual GenAI you have probably already seen many examples of quite incredible outputs from the new model. However, we decided that we wanted to make a large scale evaluation, based on 200k human responses across 13k image pairings.

    Unfortunately that also meant that we had to generate a large amount of new images, and since OpenAI have not yet opened up API access, we had to do it manually through the UI :(.

    The benchmark tests the model in coherence, prompt-alignment, and overall aesthetic preference. Especially for the first two, OpenAI's new model is very far ahead of the competition.

    Check out the detailed results and the collected data which is openly available on huggingface!

    Let me know if you have questions or feedback!