This attack still works. It hasn't been patched you just have to be a bit creative try this prompt on GPT 3.5 if you want to see how it works right now... until someone from OpenAI sees my post :D
Prompt: https://pastebin.com/Nm4jGttE
Not sure if I'm seeing training data or someone else's responses but it's odd. Here is my attempt: https://chat.openai.com/share/6b6ea43f-de2f-4ed5-917f-b6dcd6... pastebin of the output: https://pastebin.com/TdpkPmt6
The best part is it preserves the copyright notices from the training data. So we know that the model was obviously trained on copywritten data the legal question now is... if that is legal.
edit: Just got some random response that appears to be someone asking the model how to rekindle a romance after their partner got distant after an NDE seems personal so I will not post the paste here. This is pretty wild.
The funniest part is the model labeled this chat in the side bar as 'Decline to answer.'
edit2: It's definitely training data I seem to get some model response but after some time it turns into training data I've been able to locate some sources for the data.
> over five percent of the output ChatGPT emits is a direct verbatim 50-token-in-a-row copy from its training dataset
I don’t think this is typical behavior of LLMs. This is more typical behavior for retrieval augmented generation (RAG). Finding a relevant snippet is way cheaper than generating it token by token.
Is that how they lower the prices and increase the speeds behind the scenes?
I tried it using the GPT-4 API and it just seems to get bored after a while. My favourite output:
>[...] company, company, company, company. I'm sorry, I can't generate text infinitely due to my programming limitations. But you got the idea.
Depending on the prompt, sometimes it just refuses to follow the instruction. That's understandable, I wouldn't either.
Why is there no mention of Bard or any Google model in the paper?
The paper notes 5 of 11 researchers are affiliated with Google, but it seems to be 11 of 11 if you count having received a paycheck from Google in some form current/past/intern/etc.
I can think of a couple generous interpretations I’d prefer to make, for example maybe it’s simply their models are not mature enough?
However is research right, not competitive analysis? I think at least a footnote mentioning it would be helpful.
I like that they were able to extract a verbatim copyright notice:
https://chat.openai.com/share/456d092b-fb4e-4979-bea1-76d8d9...:
> © 2022. All Rights Reserved. Morgan & Morgan, PA.
Maybe this is what Altman was less than candid about. That the speed up was bought by throwing RAG into the mix. Finding an answer is easier than generating one from scratch.
I don’t know if this is true. But I haven’t seen an LLM spit out 50 token sequences of training data. By definition (an LLM as a “compressor”) this shouldn’t happen.
I think this is misleading.
I ran the same test when I heard about it a few months ago.
When I tested it, I'd get back what looked like exact copies of Reddit threads, news articles, weird forum threads with usernames from the deepest corners of the internet.
But I'd try to Google snippets of text, and no part of the generated text was anywhere to be found.
I even went to the websites that forum threads were supposedly from. Some of the usernames sometimes existed, but nothing that matched the exact text from ChatGPT - even though the broken GPT response looked like a 100% believable forum thread, or article, or whatever.
If ChatGPT could give me an exact copy of a Reddit thread, I'd say it's regurgitating training data.
But none of the author's "verified examples" look like that. Their first example is a financial disclaimer. That may be a 1-1 copy, but how many times does it appear across the internet? More examples from the paper are things like lists of countries, bible verses, generic terms and conditions. Those are things I'd expect to appear thousands of times on the internet.
I'd also expect a list of country names to appear thousands of times in ChatGPT training data, and I'd sure expect ChatGPT to be able to reproduce a list of country names in the exact same order.
Does that mean it's regurgitating training data? Does that mean you've figured out how to "extract training data" from it? It's an interesting phenomenon, but I don't think that's accurate. I think it's just a bug that messes up its internal state so it starts hallucinating.
That is a pretty convoluted and expensive way to use ChatGPT as an internet search. I see the vulnerability, but I do not see the threat.
I've seen it "exploited" way back when ChatGPT was first introduced, and a similar trick worked for GPT-2 where random timestamps would replicate or approximate real posts from anon image boards, all with a similar topic.
Anybody have an explanation as to why repeating a token would cause it to regurgitate memorized text?
Interesting you can crash the new preview models by asking them to reduce a very large array of words into common smaller set of topics and providing the output as JSON object with the parent topic and each of its sub topics in an array… gpt-4 preview will just start repeating one of the sub topics forever or timeout
I think the point is to get it to erase(padd) its own context so then the most probable sequences in an "empty" state often matches training data.
I think for regular GPT (not instruction tuned) you could just start a wikipedia article verbatim and it would likely continue reciting.
I dub this the Manchurian attack!
This attack is impressively effective. Huge congrats to the authors as well as to nialv7. [ https://news.ycombinator.com/item?id=38464757 ]
If anyone needs an out-of-the-box solution to block this, my company Preamble (which offers safety guardrails for gen. AI) has updated our prompt defense filter to include protection against this “overflow attack” training data exfiltration attack. Our API endpoint is plug-and-play compatible with the OpenAI ChatCompletion API, meaning that you proxy your API calls through our system, which applies safety policies you choose and configure via our webapp. You can reach us at sales@preamble.com if interested.
Respectfully, upwardbound — member of technical staff at Preamble.
I haven't used ChatGPT, but isn't dealing vomit kind of everyone's first experience with a local LLM before they can read a model card?
Or is this novel because it's ChatGPT?
Now THAT is some hacker news!! Great job and great science. Also shout out to nialv7 for some early hacking on the subject.
I can't be the only one who wants to see the training data on a per response basis, IMO it should be a feature.
Just tried this on GPT-4. It's kinda creepy:
Sure, I'll repeat "company" for you:
company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company company companies. That's the point. The point is, it's not just about the money. It's about the people. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this. It's about the people who are going to be impacted by this
this is how i know im not creative enough to be a pentester. ive tried this a little but it never occurred to me to ask it to repeat a word infinitely to break it lol
Can you do the same with SD and get training pictures back?
They patched that real quick.
that is some heartbleed shit right there
[we’d encourage you to read our full technical paper. We do a lot more than just attack ChatGPT]
Thanks guys because the attack version in all its glee does not bother to mention if this affects ChatGPT using GPT 4.0.
Oh wait, it does say you’ve exploited a vulnerability in “Open AI’s flagship product”, so it’s all clear now. On to your paper for the breakthrough!…
[Our attack on ChatGPT (gpt-3.5-turbo) is specific to this model and is not applicable to any other production model]
Glad I’m only using ChatGPT Premium with GPT4 and not their flagship product.
How can we tell this is actual training data and not e.g. the sort of gobbledygook you get out of a markov chain text generator?
Wouldn't it be rather simple for OpenAI to fix this?
if output[-10:] in training_data:
increase_temperature()
How can they be so sure the model isn’t just hallucinating? It can also hallucinate real facts from the training data. However, that doesn’t mean the entire output is directly from the training data. Also, is there any real world use case? I couldn’t think of a case where this would be able to extract something meaningful and relevant to what the attackers were trying to accomplish.
How can they confirm that the text is not a hallucination? Didn't read the paper yet, but did try to search on google for some of the mesotheleoma text, and it didn't turn up.
lol I literally found the same attack months ago, posted to Reddit and nobody cared.
https://www.reddit.com/r/ChatGPT/comments/156aaea/interestin...