GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text

  • I discovered recently GPT-4 is also good at a related task, word segmentation. For example, it can translate this:

        UNDERNEATHTHEGAZEOFORIONSBELTWHERETHESEAOFTRA
        NQUILITYMEETSTHEEDGEOFTWILIGHTLIESAHIDDENTROV
        EOFWISDOMFORGOTTENBYMANYCOVETEDBYTHOSEINTHEKN
        OWITHOLDSTHEKEYSTOUNTOLDPOWER
    
    To this:

        Underneath the gaze of Orion's belt, where the Sea of Tranquility meets the
        edge of twilight, lies a hidden trove of wisdom, forgotten by many, coveted
        by those in the know. It holds the keys to untold power.
    
    (The prompt was, "Segment and punctuate this text: {text}".)

    This was interesting because word segmentation is a difficult problem that is usually thought to require something like dynamic programming[1][2] to get right. It's a little surprising that GPT-4 can handle this, because it has no capability to search different alternatives to backtrack if it makes a mistake, but apparently it's stronger understanding of language means that it doesn't really need to.

    It's also surprising that tokenization doesn't appear to interfere with its ability to these tasks, because it seems like it would make things a lot harder. According to the openAI tokenizer[3], GPT-4 sees the following tokens in the above text:

        UNDER NE AT HT HE GA Z EOF OR ION SB EL TW HER ET HE SEA OF TRA
    
    Except for "UNDER", "SEA", and "OF", almost all of those token breaks are not at natural word boundaries. The same is true for the scrambled text examples in the original article. So GPT-4 must actually be taking those tokens apart into individual letters and gluing them back together into completely new tokens somewhere inside it's many layers of transformers.

    [1]: https://web.cs.wpi.edu/~cs2223/b05/HW/HW6/SolutionsHW6/

    [2]: https://pypi.org/project/wordsegmentation/

    [3]: https://platform.openai.com/tokenizer

  • It strikes me that this opens or at least expands an attack vector for prompt injection. Humans might overlook a scrambled block of text that GPT-4 will interpret and then execute.

    Also, I was curious about whether this would also work with images. So I chopped an image of a bird (a night heron) on a boat in my own collection (so, not in GPT-4's training data) into squares and randomly rearranged them, then tried to get ChatGPT-4 to interpret it. It doesn't work. I got this:

    --- CHATGPT RESPONSE:

    The image you've uploaded appears to be a composite of several smaller images that are cut and arranged in a grid-like pattern. This collage-like arrangement makes it difficult to discern a clear, singular subject. However, I can identify fragments of what seems to be a bird—possibly a magpie or some type of corvid—due to the black and white plumage. There are also elements that look like parts of a silver car, like a side mirror, and segments of wooden structures that could be parts of a bench or a deck.

    The way the images are scrambled resembles a puzzle, where each piece needs to be placed correctly to form a coherent whole. This type of image could be used as a challenge to reconstruct the original picture, requiring the viewer to mentally rotate and arrange the pieces.

    ---

    Maybe a different approach to scrambling an image would work.

  • I gave it (GPT4 Turbo) a block of text to decode with no clues as to the cipher or anything... it wasn't anything challenging (just ROT13) but it identified the encryption method and decoded it - I don't know a huge amount about how LLMs work but I was (naively?) impressed!

  • The example given in the paper of an extremely scrambled text is:

    > oJn amRh wno het 2023 Meatsrs ermtnoTuna no duySan ta atgsuAu ntaaNloi Gflo bClu, gnclcinhi ish ifsrt nereg ecatkj nad ncedos raecer jroam

    It's pretty hard to unscramble as a human. I'll leave you to read the paper if you want to see the original text.

  • GPT-4 was clearly trained to fix typos and handle not well written written requests. That much is visible directly from just using it within chatGPT UI in normal usage and fits common user scenarios (eg fix my bad draft). We know it was trained on social media data from Reddit much of which is not great writing either. Now I'm wondering if it was trained on (imperfectly) OCRed data too...

  • I once amused myself by coding a variant on "scramble the middle, keep first and last letter the same": instead of random scrambles, permute so that the resulting word is the most probable nonword according to a letter-trigram model of English.

    GPT-4 had some trouble in a quick test, probably more than for random scrambles? But got most of it: https://chat.openai.com/share/51f1a94e-b35c-4dbc-945b-ef5983...

    (It made one other mistake I didn't comment on there.)

  • I don’t use spaces at all when talking or punctuation. I have rsi so I do the minimal keystroke. I fix no spelling errors. It’s the lowest effort text string but within that string I provide a great deal of semantic context and intent.

    It never struggles or seems to misunderstand. I’ve been doing this a few months now.

  • It's still odd what the new AI models are good at, or not. Strangely to me, AI still struggles with hands. Faces are mostly good, and all sorts of odd details, such as musculature, are usually decent, but hand, of all things, seem to be the toughest. I'd have thought faces would be.

  • Whats more impressive is when GPT3.5 or 4 are capable of not just unscrambling, but answering questions about text that is flat out wrong.

    If you feed something like a bad transcript or some other very lossy (but not strictly scrambled) input.... it really can roll with it and just spit out correct information. Bad tokens in don't necessarily mean bad tokens out....

    I'm sure there is a limit to how many tokens can be flat out bad before the "next token" in the response is thrown off, but after seeing what it can do with some of these inputs, the fact it can unscramble is not at all surprising/interesting.

  • I didn't read the full text but I did notice that the authors are 75% Japanese names and one other person who I'm assuming is of Chinese heritage. It makes me think of all the Japanese art I've seen with calligraphy that is unreadable to me. I can read Japanese pretty well but artistically rendered characters are often so hard for me to grok. I would be fascinated to see this work applied in this way and I bet these authors could leverage their language skills in this adjacent way.

  • >It is counter-intuitive that LLMs can exhibit such resilience despite severe disruption to input tokenization caused by scrambled text.

    I'm not sure that i agree. an LLM maximising the likelihood of its output could surely permute its input in such a way that it unscrambles the text?

    Need to read a little deeper and will report back.

    edit: interesting result, but the paper doesn't present a good reason that this would be "counter-intuitive" imo.

  • https://chat.lmsys.org/

    is tons of fun for playing with this. I ran the Jon Rahm Masters thing through a dozen or so times:

    - a couple of the claude models get it - a couple of the llama2 70b derivative models get it - the tulo DP 70b model explain how it got it

    GPT-4 is very big and very heavily subsidized, but the other big ones can do all this stuff. I'm aware the authors of the papers know that "GPT-4" is a seller, and so the title is technically accurate, but if we as a community aren't going to push back against Open Philanthropy's dirty-money K-Street-to-Wall-St-to-University-Ave routine, who the hell is?

  • This all the more impressive given that language models mostly can't "see" individual letters, only tokens of multiple letters. So if the first and last letter of a word don't get scrambled, the tokens still change.

  • One of the problems with sentences provided to LLMs is that they may refer to specific subjects, and could potentially be part of the training set. For example the following is considered extremely difficult :

    > oJn amRh wno het 2023 Meatsrs ermtnoTuna no duySan ta atgsuAu ntaaNloi Gflo bClu, gnclcinhi ish ifsrt nereg ecatkj nad ncedos raecer jroam

    When you perform a google search for just 2023 Meatsrs, you can find a very similar sentence, and you could decipher the sentence very quickly …

  • Rmiedns me of the fun fact taht (msot) hmanus can ftulleny raed txet wrhee you sralbcme all the ltertes of all the wrdos as long as you keep the frist and last ctaerachr the smae. I gseus the brain is knid of ptomeairtun-ivnaarint in rzoeiincngg wodrs to smoe etxnet. GPT-4 wkors on teonks that are > 1 ctrcahaer in lngteh tgohuh but at laest smoe knid of token-pomtutiaren-iavnnirace might be ptrety iivutnite just loiknog at the torrmsfnear achtcetrruie.

    Reminds me of the fun fact that (most) humans can fluently read text where you scramble all the letters of all the words as long as you keep the first and last character the same. I guess the brain is kind of permutation-invariant in recognizing words to some extent. GPT-4 works on tokens that are > 1 character in length though but at least some kind of token-permutation-invariance might be pretty intuitive just looking at the transformer architecture.

    OK, the scrambling wasn't super-easy to read in this case, with the long words :)

  • Read the title as “handle unnatural scrambled TAX” and was suddenly very interested in how it might do my taxes for me.

    Hell, I would pay good money for a robo-accountant.

  • I think there is probably some -expected/predicted output, actual output, match- thing going on internally. Like how the brain handles sense data.

    Somewhat similar, https://vgel.me/posts/tools-not-needed/ (GPT-3 will ignore tools when it disagrees with them)

  • I made a bookmarklet that scrambles text, while still making it human-readable with the idea that greater effort to read might slow the reader and improve retention. I wonder if the same would apply to GPT4. It is a testable hypothesis

    https://locserendipity.com/Scramble.html

  • In an attempt to make better use of the context window, I tested GPT-4 with Huffman encoding, both giving it an already existing corpus and encoding as well as asking it to generate the encoding for me. It failed at both tasks, which convinced me it has no understanding on the underlying data and procedures even though it could generate convincingly looking codes.

  • I’ve had the GPT-4 API perform translations in my own project (shameless plug http://www.securday.com a natural language network scanner) and it required no code changes (I am using langchain). I was going to add the feature but then I decided to test and I was surprised it just worked.

  • Yes, but can it unscramble an scrambled text in an obscure language like basque or friesian just as well? or perhaps voynich? https://www.ic.unicamp.br/~stolfi/voynich/98-12-28-interln16...

  • “Just a token predictor…” These things are absolutely working at a concept-level. Tokens are just the interface.

  • Do you need an LLM to do this? How much do a word or language model is necessary to do the unscrambling?

    Could you simply train a raw network on a bunch of scrambled and unscrambled text pairs that are representative of the English language? Something with a few hundred million parameters or less?

  • Yep. Sometimes I'll OCR a PDF and it comes out like trash and I can ask GPT to clean it up.

  • Is this... good? I'd think the desired behavior would be to notice that there was something wrong with the input.

  • My early attempts at uncensoring the models sort of stumbled into this as well. I told the models to produce output heavily using the transposed letter effect. I would also have it remove random letters in words longer than 4 letters. The end result was a broken, readable, but surprisingly consistent output.

    I wrote a little script that would apply a regex to scramble words and remove letters to ask questions. The models could almost always use this input which was crazy to me. Made for some fun time having uncensored conversations in complete nonsense with the models.

  • Has anyone tried to see if it could crack enigma encoded messages?

  • The word "Please" in the prompt.

  • Interesting paper. I just tried it myself. I first had GPT-4 write a Python program to randomly reorder the letters in each word in three sentences I took from the New York Times website. I then started a new GPT-4 session and used the same prompt as in the paper get it to unscramble the words. As you can see below, it made only two mistakes, both hallucinatory: 2015 --> 0521 --> 2021 and pepitos --> epsotip --> opposite

    Prompt: The following sentence contains words with scrambled letters. Please recover the original sentence from it.

    Scrambled: loEn kusM teceedblra ihs th44 hytirbad in Juyl 0521 ta a ahte-dyer pryat wrnoht by sih wfei at a aCinafliro iwen otrcnyu orsret tdetod hwti nsic.ab

    Unscrambled: Elon Musk celebrated his 44th birthday in July 2021 at a three-day party thrown by his wife at a Californian wine country resort dotted with cabins.

    [Original: Elon Musk celebrated his 44th birthday in July 2015 at a three-day party thrown by his wife at a California wine country resort dotted with cabins.]

    Scrambled: sA aeIlsr aebdste wath to do enxt ni zGaa, I pheo rsaselI’ ipamiilrtcayl-otil rdaiehsple liwl elrfcet on hte dagea enotf daiuetttbr to :csCuoufin rfeoe“B ouy rbkmea no a enroujy of ,nreeevg dgi wto easv”rg — eno fro yoru eeymn nad eon rof yseorfl.u

    Unscrambled: As Israel debates what to do next in Gaza, I hope Israel’s militarily-political leadership will reflect on the adage often attributed to Confucius: “Before you embark on a journey of revenge, dig two graves” — one for your enemy and one for yourself.

    [Original: As Israel debates what to do next in Gaza, I hope Israel’s political-military leadership will reflect on the adage often attributed to Confucius: “Before you embark on a journey of revenge, dig two graves” — one for your enemy and one for yourself.]

    Scrambled: ndUre na edletave wuaysb rctka ni ,eQunes oiVrct éJso enHáderzn was pwhnipgi up eht epsotip that he adh rcfedteep ta a etestr tarc in aCsra,ca nazVue.lee

    Unscrambled: Under an elevated subway track in Queens, Victor José Hernández was whipping up the opposite that he had perfected at a street cart in Caracas, Venezuela.

    [Original: Under an elevated subway track in Queens, Victor José Hernández was whipping up the pepitos that he had perfected at a street cart in Caracas, Venezuela.]