Up vote this enough and they'll be able to solve it next week --- the same way they solve most things --- by memorizing the answer from "training".
LLMs exhibit the characteristics of "savant syndrome".
Tried this with Gemini and it face-planted right at the start by mis-parsing the position of the black king and pawn :(
Chess is a classic case which LLMs are structurally inadequate. The strength of LLMs is that they can "short circuit" and arrive the right answer by the wrong methods for squishy problems like language. [1] Chess is about precision and I'm awful at it myself because of the line noise in my schizotypal brain, even if I can compensate pretty well for it doing hard math. Short-circuiting just doesn't cut it for chess, which is why you get the invalid moves (the only requirement for a chess program is that it never make invalid moves)
It is a tree search problem, even if human intuition makes it so we can think through about 20 board positions whereas my α-β program has to look at 2-20 million positions to challenge a serious beginner. Yes, the problem could be solved by a lookup in an endgame table; the 5 piece table is reasonable to keep with a specialized chess program (think how big an AAA game is!) but not reasonable to store in an LLM which does many other things than chess.
To make my chess program able to challenge better players the direct path is to make it look at more positions, and most of that is optimizing the code, not allocating any objects in the inner loop, calculating the evaluation function incrementally, off-heap transposition table, etc. Not the path of ChatGPT-o.
It is a thing to make a neural chess engine that picks better moves and doesn't have to look at as many positions [2], but it's a specialized thing; chess just isn't a model of general intelligence.
[1] I was working on foundation models the summer BERT came out and had a technique of "predictive evaluation" that let us look at a small number of examples in detail and decide what the potential and weaknesses of a system were. By this method I saw BERT as a huge step forward, yet, this method entirely discounted "get the right answer by the wrong path" and we might have rejected the path that led to today's LLMs.
[2] https://en.wikipedia.org/wiki/AlphaZero