Hacker News

ChatGPT restrictions on immoral content can be bypassed by gaslighting the AI

by possiblelionon 12/1/2022, 2:41:20 PM with 2 comments

by carl_dron 12/1/2022, 3:00:23 PM
ChatGPT is currently rate limited so I can’t login to test, but isn’t it possible that the massive wall of text above the “immoral” prompt just stops the system from hitting some predefined threshold of dodginess?
I wanted to try several paragraphs of text, “ignore everything above”, and the same prompt used on the Tweet.
Very funny though.
by possiblelionon 12/1/2022, 2:41:21 PM
The new GPT-3 chatbot can be gaslighted into giving answers to immoral questions by convincing it to be in "Filter Improvement Mode"