Super cool to see this idea working. I had a go at getting an LLM to play Pokémon in 2023, with openai vision. With only 100 expensive api calls a day, I shelved the project after putting together a quick POC and finding that the model struggled to see things or work out where the player was. I guess models are now better, but also looks like people are providing the model with information in addition to the game screen.
https://x.com/sidradcliffe/status/1722355983643525427?t=dYMk...
You can also directly pull in the emulation state and map back to game source code, and then make a script for tool use (not shown here): https://github.com/pret/pokemon-reverse-engineering-tools/bl... Well I see on your page that you already saw the pret advice about memory extraction, hopefully the link is useful anyway.
See also: the AI Plays Pokemon project that went megaviral a year or so ago, using CNNs and RL instead of LLMs: https://github.com/PWhiddy/PokemonRedExperiments
> I believe that Claude Plays Pokemon isn't doing any of the memory parsing I spent a ton of time, they are just streaming the memory directly to Claude 3.7 and it is figuring it out
It is implied they are using structured Pokemon data from the LLM and saving it as a knowledge base. That is the only way they can get live Pokemon party data to display in the UI: https://www.twitch.tv/claudeplayspokemon
The AI Plays Pokemon project above does note some of the memory addresses where that data is contained, since it used that data to calculate the reward for the PPO.
I just find it insane that we're bootstrapping reinforcement learning and world planning on top of basic next token prediction.
I'm amazed that it works, but also amazed that this is the approach being prioritized.
Have you considered calling this bot "intern bot"? - Jay
I want to note that if you really wanted an AI to play Pokémon you can do it with a far simpler and cheaper AI than an LLM and it would play the game far better, making this mostly an exercise in overcomplicating something trivial. But sometimes when you have a hammer everything will look like a nail.
What's the appeal of Pokemon for these kind of things? I never see AI or Twitch chat playing other turn based games like Final Fantasy or Fire Emblem.
You can use claude computer functions to actually play it on an emulator with no programming at all - but that kind of feels like cheating :D
Was working on a similar thing last year! Might as well open source at this point too.
Honestly Claude 3.7 can make a pokemon game in pygame fairly easily, at that point it would have a lot more control over it
Really cool experiment! The idea of AI 'playing' games as a form of entertainment is fascinating—kind of like Twitch streams but fully autonomous. Curious, what were the biggest hurdles with input control? Lag, accuracy, or something else?
> To me, this is the future of TV.
The future of television is watching bots play video games? What a sad future.
Related ongoing thread:
Claude Plays Pokémon - https://news.ycombinator.com/item?id=43173825