This would be amazing with Smalltalk/Pharo or a similar language where the concept of debugging is a first class citizen (I guess it's the same for certain Lisp languages?)
Nice! I recently had and built the same idea using MCP (in order to be client / LLM agnostic) and VS Code (DAP would be even better, but haven't tried tackling it).
also see: https://github.com/plasma-umass/ChatDBG
Nice UI. We started on a project that does this about 2 years ago. ChatDBG (https://github.com/plasma-umass/ChatDBG), downloaded about 70K times to date. It integrates into debuggers like `lldb`, `gdb`, and `pdb` (the Python debugger). For C/C++, it also leverages a language server, which makes a huge difference. You can also chat with it. We wrote a paper about it, should be published shortly in a major conference near you (https://arxiv.org/abs/2403.16354). One of the coolest things we found is that the LLM can leverage real-world knowledge to diagnose errors; for example, it successfully debugged a problem where the number of bootstrap samples was too low.
Really great idea, I currently work for an AI, compiling and debugging its code, at least that's what it sometimes feels like. Who is the agent here exactly? The fact that the AI has no understanding at all of what we are doing, and does not apply the information it does know to solve problems is challenging. At least if it debugged the code, it would be able to see that it is clobbering the same registers that it is using, instead of me having to explain that to it. Fortunately I am talking about my hobby projects, I pity people who are doing this for a living now.
Hopefully work like this has the side effect of people making debuggers work again. In my experience they seldom do these days (except in old-school tech like C, C++, golang). Presumably because the younger folks were told in college that debugging wasn't necessary. I don't mean debuggers just don't run, rather that they're sufficiently broken that they're not worthwhile using. Perhaps an LLM that adds print statements to code and reads the output would be more in keeping with the times?
Very cool concept! There's a lot of potential in reducing the try-debug--fix cycle for LLMs.
On a related note, here's a Ruby gem I wrote that captures variable state from the moment an Exception is raised. It gets you non-interactive text-based debugging for exceptions.
Hei this is lovely,
i created a extension to help me debug a while back [0], and i thought of this (ai integration) for a long time, but did not have the time to tackle this.
Thank you so much for sharing!
I might need to add this approach to my extension as well!
A time-traveling debugger for Python + LLM would be amazing.
Nice, I did this manually once with ipdb, just cutting and pasting the text to LLM and having it tell me which variables to inspect and what to press.
I like the first paragraph of the README clearly stating that this is your research project instead of making a lot of grandiose claims.
Thank you for this. Would love to see it integrated into copilot or cusor
Interesting experiment! This feels like it could really expand the potential applications of LLMs. Exciting to see how AI can assist in debugging with live runtime context!
Do not let it debug itself!!
Honestly, this is the first LLM concept that makes me want to change my workflow. I don't use vscode but I'm excited by the idea.
To people who do this sort of thing:
Does generating synthetic data using this make sense for RL of models to be be really good at debugging code? Currently all LLMs inhaled all of code in the world but the data is only text of the code (maybe plus changes that fixed bugs etc) but the amount of insight that can be generated by actually running the code and getting the runtime values, step-by-step is almost infinite.
Is this sort of data useful for training LLMs?