This is extremely impressive, significantly beating the state of the art models from Google, OpenAI and Anthropic on both OSWorld and Android World, based on a relatively small open source model [0] that can be run locally. At this pace of progress, I now wouldn't bet against AIs hitting that human threshold of UI interaction before the end of the year.
This is extremely impressive, significantly beating the state of the art models from Google, OpenAI and Anthropic on both OSWorld and Android World, based on a relatively small open source model [0] that can be run locally. At this pace of progress, I now wouldn't bet against AIs hitting that human threshold of UI interaction before the end of the year.
[0] https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B