Per the cited real world figures, that's about 1 in 40 tests that pass human review, or a success rate of about 2.5%.
It's hard to see value in spending resources this way right now - most notably, engineer time to review the generated tests. Improve the hit rate by an order of magnitude, and I suspect I'd feel differently.
Tried this out on a Ruby codebase and it generated Python tests: https://github.com/Codium-ai/cover-agent/issues/17. Is there any data available on whether this actually works?
Why does this webpage have auto-playing audio?
The audio track on load that has no obvious way to stop playing prevents me from reading this content. Please don't do that.
Using ChatGPT to generate unit tests works great almost out of the box, but I guess this system solves the remaining 5% to make it fully automated end-to-end. I believe this will work and help us write better software, given that I have experienced numerous cases where the generated tests (even with inferior models) catch no-so-obvious bugs.
Seems decent enough for boilerplate. But if my code is incorrect, won’t an LLM generated a test for incorrect code?
Interesting idea. I generally don’t run tests at all (hobbyist) so even mediocre llm tests may actually be a win
Don't see any actual output measurement in the conclusion — it seems like the effort may not have really borne fruit.
To the OP:
Is your name a reference to Gronky Scripples? https://www.youtube.com/watch?v=4KG3v365mq4
Love that you took something that meta wrote about but didn't actually release and then... did it for them haha :)
I get redirected to an oops 404 page when I try to create an account using Github.
Any chance of supporting integrations with AWS, Azure, GCP APIs?
How do people feel about LLM generated tests?
I tried creating some on a personal project just using ChatGPT and it saved me a lot of toil on tests I probably wouldn’t have written. I did find I had low trust in refactoring my code, but higher than if I’d had no tests.
It seemed like a net positive for low risk cases.