need to show this to people claiming "it's just statistical inference, these models can't demonstrate any '''understanding'''" as if understanding is proven to be something else. these people assume every single bit of intelligence these models show is somehow in their training set which is patently false and very easy to test that it is false.
first week ChatGPT was public more than a year ago, I tried the early model to make it play along with me inventing a new programming language with novel attributes in syntax. after some back and forth, it could translate my javascript samples to the new programming language paying attention to the new language's semantics, and could even simulate running simple pieces of code. Sure, it had some token errors here and there but it was working. It was understanding what I was telling it, and responding in kind.
Over the months since, things only got better. So I'm not surprised with the results of this post, but still astonished the same.
You may be interested in a similar experiment from the Gemini tech report: https://twitter.com/jeffdean/status/1758182184694005787?s=46...
I will always find that name "Claude" very funny for some reason. Can't wait for Jean-Christophe 2, and Patrick 7
> To test for possible contamination, I tried the same prompts without attaching the sample translations and Claude failed and refused to answer, saying that it is unfamiliar with the Circassian language.
This doesn't indicate that Claude is unfamiliar with Circassian, only that Circassian is sufficiently rare that refusing to answer is a plausible response.
The language is not that obscure in the grand scheme of things, there's a Wikipedia article explaining the grammar https://en.wikipedia.org/wiki/Kabardian_grammar which is definitely in Claude's training set, probably alongside a few hundred linguistics papers and a bunch of monolingual data.
If you measured the performance for different numbers of initial translation examples, I suspect that there will be a sudden jump at the point where Claude stops refusing to even try, and after that additional examples will only marginally improve the output.