Grok-1.5 Vision Preview

  • I think a lot of people are sleeping on Xai for two reasons - Twitter & Tesla FSD data.

    I've seen numerous talks recently by Ai leaders discussing how the next level of LLMs have to "understand the physical world" (which tesla FSD is a leader in, see Elon's response to Sora - https://www.news18.com/tech/elon-musk-claims-tesla-has-been-...), along with Twitter being the "town square" of the internet giving them an immense advantage. I wouldn't be surprised if the Twitter API was completely cut off one day.

  • Interesting that they built their own benchmark and that it primarily features images from vehicles. Tesla overlap?

    > ... we are introducing a new benchmark, RealWorldQA. This benchmark is designed to evaluate basic real-world spatial understanding capabilities of multimodal models. While many of the examples in the current benchmark are relatively easy for humans, they often pose a challenge for frontier models.

    > The initial release of the RealWorldQA consists of over 700 images, with a question and easily verifiable answer for each image. The dataset consists of anonymized images taken from vehicles, in addition to other real-world images ... RealWorldQA is released under CC BY-ND 4.0.

    Will be interesting to see the feedback once someone has a change to look into the dataset (https://data.x.ai/realworldqa.zip).

    Side note — I'm very impressed with their "Explaining a meme" example.

  • If those comparisons to other models are remotely accurate, that is a pretty impressive feat to catch up that quickly.

  • [flagged]

  • [flagged]