I have a friend that always says "innovation happens at the speed of trust". Ever since GPT3, that quote comes to mind over and over.
Verification has a high cost and trust is the main way to lower that cost. I don't see how one can build trust in LLMs. While they are extremely articulate in both code and natural language, they will also happily go down fractal rabbit holes and show behavior I would consider malicious in a person.
I bumped into this at work but not in the way you might expect. My colleague and I were under some pressure to show progress and decided to rush merging a pretty significant refactor I'd been working on. It was a draft PR but we merged it for momentum's sake. The next week some bugs popped up in an untested area of the code.
As we were debugging, my colleague revealed his assumption that I'd used AI to write it, and expressed frustration at trying to understand something AI generated after the fact.
But I hadn't used AI for this. Sure, yes I do use AI to write code. But this code I'd written by hand and with careful deliberate thought to the overall design. The bugs didn't stem from some fundamental flaw in the refactor, they were little oversights in adjusting existing code to a modified API.
This actually ended up being a trust building experience over all because my colleague and I got to talk about the tension explicitly. It ended up being a pretty gentle encounter with the power of what's happening right now. In hindsight I'm glad it worked out this way, I could imagine in a different work environment, something like this could have been more messy.
Be careful out there.
I don't understand the premise. If I trust someone to write good code, I learned to trust them because their code works well, not because I have a theory of mind for them that "produces good code" a priori.
If someone uses an LLM and produces bug-free code, I'll trust them. If someone uses an LLM and produces buggy code, I won't trust them. How is this different from when they were only using their brain to produce the code?
That is already the case for me. The amount of times I've read "apologies for the oversight, you are absolutely correct" is staggering: 8 or 9 out of 10 times. Meanwhile I constantly see people mindlessly copy paying llm generated code and subsequently furious when it doesn't do what they expected it to do. Which, btw, is the better option: I'd rather have something obviously broken as opposed to something seemingly working.
All of this fighting against LLMs is pissing in the wind.
It seems that LLMs, as they work today, make developers more productive. It is possible that they benefit less experienced developers even more than experienced developers.
More productivity, and perhaps very large multiples of productivity, will not be abandoned due roadblocks constructed by those who oppose the technology due to some reason.
Examples of the new productivity tool causing enormous harm (eg: bug that brings down some large service for a considerable amount of time) will not stop the technology if it being considerable productivity.
Working with the technology and mitigating it's weaknesses is the only rational path forward. And those mitigation can't be a set of rules that completely strip the new technology of it's productivity gains. The mitigations have to work with the technology to increase its adoption or they will be worked around.
> While the industry leaping abstractions that came before focused on removing complexity, they did so with the fundamental assertion that the abstraction they created was correct. That is not to say they were perfect, or they never caused bugs or failures. But those events were a failure of the given implementation a departure from what the abstraction was SUPPOSED to do, every mistake, once patched led to a safer more robust system. LLMs by their very fundamental design are a probabilistic prediction engine, they merely approximate correctness for varying amounts of time.
I think what the author misses here is that imperfect, probabilistic agents can build reliable, deterministic systems. No one would trust a garbage collection tool based on how reliable the author was, but rather if it proves it can do what it intends to do after extensive testing.
I can certainly see an erosion of trust in the future, with the result being that test-driven development gains even more momentum. Don't trust, and verify.
> promises that the contributed code is not the product of an LLM but rather original and understood completely.
> require them to be majority hand written.
We should specify the outcome not the process. Expecting the contributor to understand the patch is a good idea.
> Juniors may be encouraged/required to elide LLM-assisted tooling for a period of time during their onboarding.
This is a terrible idea. Onboarding is a lot of random environment setup hitches that LLMs are often really good at. It's also getting up to speed on code and docs and I've got some great text search/summarizing tools to share.
> LLMs ⌠approximate correctness for varying amounts of time. Once that time runs out there is a sharp drop off in model accuracy, it simply cannot continue to offer you an output that even approximates something workable. I have taken to calling this phenomenon the "AI Cliff," as it is very sharp and very sudden
Iâve never heard of this cliff before. Has anyone else experienced this?
The last section of this post seems to be quite predictive of a sibling post on the front page right now: https://news.ycombinator.com/item?id=44382752
The article opens with a statement saying the author isn't going to reword what others are writing, but the article reads as that and only that.
That said, I do think it would be nice for people to note in pull requests which files have AI gen code in the diff. It's still a good idea to look at LLM gen code vs human code with a bit different lens, the mistakes each make are often a bit different in flavor, and it would save time for me in a review to know which is which. Has anyone seen this at a larger org and is it of value to you as a reviewer? Maybe some tool sets can already do this automatically (I suppose all these companies report the % of code that is LLM generated must have one if they actually have these granular metrics?)
Hi everyone, author here.
Sorry about the JS stuff I wrote this while also fooling around with alpine.js for fun. I never expected it to make it to HN. I'll get a static version up and running.
Happy to answer any questions or hear other thoughts.
Edit: https://static.jaysthoughts.com/
Static version here with slightly wonky formatting, sorry for the hassle.
Edit2: Should work on mobile now well, added a quick breakpoint.
> The reality is that LLMs enable an inexperienced engineer to punch far above their proverbial weight class. That is to say, it allows them to work with concepts immediately that might have taken days, months or even years otherwise to get to that level of output.
At the moment LLMs allow me to punch far above my weight class in Python where I do a short term job. But then I know all the concepts from decades dabbling in other ecosystems. Letâs all admit there is a huge amount of accidental complexity (h/t Brooksâs Silver-bullet) in our world. For better or worse there are skill silos that are now breaking down.
All this means is that the QC is going to be 10x more important.
One trust breaking issue is we still can't know why the LLM makes specific choices.
Sure we can ask it why it did something but any reason it gives is just something generated to sound plausible.
They changed the headline to "Yes, I will judge you for using AI..." so I feel like I got the whole story already.
Well said. The death of trust in software is a well worn path from the money that funds and founds it to the design and engineering that builds it - at least the 2 guys-in-a-garage startup work I was involved in for decades. HITL is key. Even with a human in the loop, you wind up at Therac 25. That's exactly where hybrid closed loop insulin pumps are right now. Autonomy and insulin don't mix well. If there weren't a moat of attorneys keeping the signal/noise ratio down, we'd already realize that at scale - like the PR team at 3 letter technical universities designed to protect parents from the exploding pressure inside the halls there.
LLMs make bad workâ of any kindâ look like plausibly good work. Thatâs why it is rational to automatically discount the products of anyone who has used AI.
I once had a member of my extended family who turned out to be a con artist. After she was caught, I cut off contact, saying I didnât know her. She said âI am the same person youâve known for ten years.â And I replied âI suppose so. And now I realized I have never known who that is, and that I never can know.â
We all assume the people in our lives are not actively trying to hurt us. When that trust breaks, it breaks hard.
No one who uses AI can claim âthis is my work.â I donât know that it is your work.
No one who uses AI can claim that it is good work, unless they thoroughly understand it, which they probably donât.
A great many students of mine have claimed to have read and understand articles I have written, yet I discovered they didnât. What if I were AI and they received my work and put their name on it as author? Theyâd be unable to explain, defend, or follow up on anything.
This kind of problem is not new to AI. But it has become ten times worse.
We have seen those 10x engineers churning out PRs and huge PRs before anyone can fathom and make sense of the whole damn thing.
Wondering what they would be producing with LLMs?
He's making a good point on trust, but, really, doesn't the trust flow both directions? Should the Sr. Engineer rubber stamp or just take a quick glance at Bob's implementation because he's earned his chops, or should the Sr. Engineer apply the same level of review regardless of whether it's Bob, Mary, or Rando Calrissian submitting their work for review?
I'm currently standing up a C++ capability in an org that hasn't historically had one, so things like the style guide and examples folder require a lot of care to give a good start for new contributors.
I have instructions for agents that are different in some details of convention, e.g. human contributors use AAA allocation style, agents are instructed to use type first. I convert code that "graduates" from agent product to review-ready as I review agent output, which keeps me honest that I don't myself submit code without scrutiny to the review of other humans: they are able to prompt an LLM without my involvement, and I'm able to ship LLM slop without making a demand on their time. Its an honor system, but a useful one if everyone acts in good faith.
I get use from the agents, but I almost always make changes and reconcile contradictions.
IMHO, s/may/has/
There was trust?
it's really hard using AI (not impossible) to produce meaningful offensive security to improve defense due to there being way too many guard rails.
While on the other hand real nation-state threat actors would face no such limitations.
On a more general level, what concerns me isn't whether people use it to get utility out of it (that would be silly), but the power-imbalance in the hand of a few, and with new people pouring their questions into it, this divide getting wider. But it's not just the people using AI directly but also every post online that eventually gets used for training. So to be against it would mean to stop producing digital content.
I am a software engineer who writes 80-90% code with AI (sorry, can't ignore the productivity boost), and I mostly agree with this sentiment.
I found out very early that under no circumstances you may have the code you don't understand, anywhere. Well, you may, but not in public, and you should commit to understanding it before anyone else sees that. Particularly before sales guys do.
However, AI can help you with learning too. You can run experiments, test hypotheses and burn your fingers so fast. I like it.
[dead]
[Stub for offtopicness, including but not limited to comments replying to original title rather than article's content]
[dead]
There's no reason to think AI will stop improving, and the rate of improvement is increasing as well, and no reason to think that these tools won't vastly outperform us in the very near future. Putting aside AGI and ASI, simply improving the frameworks of instructions and context, breaking down problems into smaller problems, and methodology of tools will result in quality multiplication.
Making these sort of blanket assessments of AI, as if it were a singular, static phenomena is bad thinking. You can say things like "AI Code bad!" about a particular model, or a particular model used in a particular context, and make sense. You cannot make generalized statements about LLMs as if they are uniform in their flaws and failure modes.
They're as bad now as they're ever going to be again, and they're getting better faster, at a rate outpacing the expectations and predictions of all the experts.
The best experts in the world, working on these systems, have a nearly universal sentiment of "holy shit" when working on and building better AI - we should probably pay attention to what they're seeing and saying.
There's a huge swathe of performance gains to be made in fixing awful human code. There's a ton of low hanging fruit to be gotten by doing repetitive and tedious stuff humans won't or can't do. Those two things mean at least 20 or more years of impressive utility from AI code can be had.
Things are just going to get faster, and weirder, and weirder faster.
https://archive.is/5I9sB
(Works on older browsers and doesn't require JavaScript except to get past CloudSnare).