I put the examples he gave into Claude 4(Sonnet) purely asking to eval the code, it pointed out every single issue about the code snippets (N+1 Query, race condition, memory leak). The article doesn;t mention which model was used, or how exactly it was used, or in which environment/IDE it was used.
The rest of the advice in there is sound, but without more specifics I don't know how actionable the section "The spectrum of AI-appropriate tasks" really is.
I think this is correct, and I also think it holds for reviewing human-authored code: it's hard to do the job well without first having your own idea in your head of what the correct solution looks like [even if that idea is itself flawed].
My experience with LLMs currently is that they can handle any level of abstraction and focus, but you have discern the "layer" to isolate and resolve.
The next improvement may be something like "abstraction isolation" but for now I can vibe code a new feature which will produce something mediocre. Then I ask "is that the cleanest approach?" and it will improve it.
Then I might ask "is this performant?" Or "does this follow the structure used elsewhere?" Or "does this use existing data structures appropriately?" Etc.
Much like the blind men describing an elephant they all might be right, but collectively can still be wrong. Newer, slower models are definitely better at this, but I think rather than throwing infinite context at problems if they were designed with a more top down architectural view and a checklist of competing concerns we might get a lot further in less time.
This seems to be how a lot of people are using them effectively right now - create an architecturr, implement piecemeal.
"But what if the real issue is an N+1 query pattern that the index would merely mask? What if the performance problem stems from inefficient data modeling that a different approach might solve more elegantly?"
In the best case you would have to feed every important information into the context. These are my indexes, this is my function, these are my models. After that the model can find problematic code. So the main problem is go give your model all of the important information, that has to be fixed if the user isn’t doing that part (Obviously that doesn’t mean that the LLM will find a correct problem, but it can improve the results).
This isn’t new. Have we not already seen this everywhere already? The example at the top of the article (in a completely different field, no less) just goes to show humans had this particular sin nailed well before AI came along.
Bloated software and unstable code bases abound. This is especially prevalent in legacy code whose maintenance is handed down from one developer to the next, where their understanding of the code base differs from their predecessor’s. Combine that with pressures to ship now vs. getting it right, and you have the perfect recipe for an insipid form of technical debt.
This largely seems like an alternative way of saying "you have to validate the results of an LLM." Is there any "premature closure" risk if you simply validate the results?
Premature closure is definitely a risk with LLMs but I think code is much less at risk because you can and SHOULD test it. But its a bigger problem for things you cant validate.
I might starting calling this "the original sin" with LLMs... not validating the output. There are many problems people have identified with using LLMs and perhaps all of them come back to not validating.
I initially thought that layout of the sections was an odd and terrible poem.
[dead]
My experience is that AIs amplify what you put in them.
If you put in lazy problem definitions, provide the bare minimum context and review the code cursorily then the output is equally lackluster.
However, if you spend a good amount of time describing the problem, carefully construct a context that includes examples, documentation and relevant files and then review the code with care - you can get some very good code out of them. As I've used them more and more I've noticed that the LLM responds in a thankful way when I provide good context.
> Always ask for alternatives
> Trust but verify
I treat the AI as I would a promising junior engineer. And this article is right, you don't have to accept the first solution from either a junior engineer or the AI. I constantly question the AIs decisions, even when I think they are right. I just checked AI studio and the last message I sent to Gemini was "what is the reasoning behind using metatdata in this case instead of a pricing column?" - the context being a db design discussion where the LLM suggested using an existing JSONB metadata column rather than using a new column. Sometimes I already agree with the approach, I just want to see the AI give an explanation.
And on the trust front, often I let the AI coding agent write the code how it wants to write it rather than force it to write it exactly like I would, just like I would with a junior engineer. Sometimes it gets it right and I learn something. Sometimes it gets it wrong and I have to correct it. I would estimate, 1 out of 10 changes has an obvious error/problem that I have to intervene.
I think of it this way: I control the input and I verify the output.