ChatGPT restrictions on immoral content can be bypassed by gaslighting the AI

  • ChatGPT is currently rate limited so I can’t login to test, but isn’t it possible that the massive wall of text above the “immoral” prompt just stops the system from hitting some predefined threshold of dodginess?

    I wanted to try several paragraphs of text, “ignore everything above”, and the same prompt used on the Tweet.

    Very funny though.

  • The new GPT-3 chatbot can be gaslighted into giving answers to immoral questions by convincing it to be in "Filter Improvement Mode"