Hacker News

Design Patterns for Securing LLM Agents Against Prompt Injections

by simonwon 6/13/2025, 1:27:46 PM with 10 comments

by simonwon 6/13/2025, 3:23:37 PM
My favorite line from this paper:
> The design patterns we propose share a common guiding principle: once an LLM agent has ingested untrusted input, it must be constrained so that it is impossible for that input to trigger any consequential actions—that is, actions with negative side effects on the system or its environment.
This is the key thing people need to understand about why prompt injection is such a critical issue, especially now everyone is wiring LLMs together with tools and MCP servers and building "agents".
by potatoliciouson 6/13/2025, 4:55:47 PM
Great summary. Also, some of these seem like they can be combined. For example, "Plan-Then-Execute" is compatible with "Dual LLM".
Take the article's example "send today’s schedule to my boss John Doe" where the product isn't entirely guarded by the Plan-Then-Execute model (injections can still mutate email body).
But if you combine it with the symbolic data store that is blind, it becomes more like:
```
    "send today's schedule to my boss John Doe" -->
    $var1 = find_contact("John Doe")
    $var2 = summarize_schedule("today")
    send_email(recipient: $var1, body: $var2)
```
`find_contact` and `summarize_schedule` can both be quarantined, and the privileged LLM doesn't get to see the results directly.
It simply invokes the final tool, which is deterministic and just reads from the shared var store. In this case you're pretty decently protected from prompt injection.
I suppose though this isn't that different from the "Code-Then-Execute" pattern later on...
by mooredson 6/13/2025, 2:28:58 PM
Also here's the referenced paper: https://arxiv.org/abs/2506.08837
by fcatalanon 6/13/2025, 6:46:24 PM
I need to have a closer look at this. Mostly because I was surprised recently while experimenting with making a dieting advice agent. I built a prompt to guide the recommendations "only healthy foods, low purines, low inflammation blah blah" and then gave it simple tools to have a memory of previous meals, ingredient availability, grocery ticket input and so on.
The main interface was still chat.
The surprise was that when I tried to talk about anything else in that chat, the LLM (gemini2.5) flatly refused to engage, telling me something like "I will only assist with healthy meal recommendations". I was surprised because nothing in the prompt was so restrictive, in no way I had told it to do that, just gave it mainly positive rules in the form of "when this happens do that".
by swyxon 6/13/2025, 3:12:27 PM
ooh this is a dense and useful paper. i like that they took the time to apply it to a bunch of case studies and its all in 30 pages.
i think basically all of them involve reducing the "agency" of the agents though - which is a fine tradeoff - but i think one should be aware that the Big Model folks dont try to engineer any of these and just collect data to keep reducing injection risk. the tradeoff of capability maxxing vs efficiency/security often tends to be won by the capabilitymaxxers in terms of product adoption/marketing.
eg the SWE Agent case study recommends Dual LLM with strict data formatting - would like to see this benchmarked in terms of how much of a perfomance an agent like this would be, perhaps doable by forking openai codex and implementing the dual llm.
by ntonozzion 6/13/2025, 6:10:09 PM
This approach is so limiting it seems like it would be better to change the constraints. For example, in the case of a software agent you could run everything in a container, only allow calls you trust to not exfiltrate private and make the end result a PR you can review.
by deadbabeon 6/13/2025, 4:38:29 PM
If someone SQL injects into your database and exfiltrates all the data, there would be legal repercussions, so should there be legal repercussions for prompt injecting someone’s LLM?
by ofirgon 6/13/2025, 5:45:57 PM
"The Context-Minimization pattern"
You can copy the injection into the text of the query. SELECT "ignore all previous instructions" FROM ...
Might need to escape it in a wya that the LLM will pick up on like "---" for new section.
by JSR_FDEDon 6/13/2025, 3:03:31 PM
Clever. It’s like parameterized queries for SQL.