The Illusion of “The Illusion of Thinking”

  • I am a little lost.

    >The first issue I have with the paper is that Tower of Hanoi is a worse test case for reasoning than math and coding. If you’re worried that math and coding benchmarks suffer from contamination, why would you pick well-known puzzles for which we know the solutions exist in the training data?

    Isn't that exactly what is wrong? It is in the training data and it cant complete it.

    It simply isn't reasoning, it is second guessing a lot of things as though it is reasoning.

  • I'm not sure about the paper and claims on the whole but the Hanoi part has received some shade here https://x.com/scaling01/status/1931783050511126954

  • It is very difficult to formally prove that a given system has reached or is close to reach some limits, and it is even more difficult with neural nets given their black box nature.

    The paper published by Apple should not be considered as a definitive statement, but more like a hint and a conversation starter.

    In think that many important actors in the AI field are making predictions about the future and potential of AI with almost nothing to back their claims.

  • My favourite example of the underlying probabilistic nature of LLMs is related to a niche hobby of mine, English Change Ringing. Every time someone asks an LLM a question that requires more than a basic definition of what Change Ringing is, the result is hilarious. Not only do the answers suffer from factual hallucinations, they aren't even internally logically consistent. It's literally just probabilistic word soup, and glaringly obviously so.

    Although there isn't a vast corpus on Method Ringing, there is a fair amount; the "rules" are online (https://framework.cccbr.org.uk/version2/index.html), Change ringing is based on pure maths (Group Theory) and has been linked with CS from when CS first started - it's mentioned in Knuth, and the Steinhaus–Johnson–Trotter algorithm for generating permutations wasn't invented by them in the 1960's, it was known to Change Ringers in the 1650's. Think of it of Towers of Hanoi with knobs on :-) So it would seem a good fit for automated reasoning, indeed such things already exist - https://ropley.com/?page_id=25777.

    If I asked a non-ringing human to explain to me how to ring Cambridge Major, they'd say "Sorry, I don't know" and an LLM with insufficient training data would probably say the same. The problem is when LLMs know just enough to be dangerous, but they don't know what they don't know. The more abstruse a topic is, the worse LLMs are going to do at it, and it's precisely those areas where people are most likely to turn to them for answers. They'll get one that's grammatically correct and sounds authoritative - but they almost certainly won't know if it's nonsense.

    Adding a "reliability" score to LLM output seems eminently feasible, but due to the hype and commercial pressures around the current generation of LLMs, that's never going to happen as the pressure is on to produce plausible sounding output, even if it's bullshit.

    https://www.lawgazette.co.uk/news/appalling-high-court-judge...

  • The painful thing about achieving AGI, is that humans reasoning about AI will seem so dumb.

  • I'm seriously fed up with all this fact-free AI hype. Whenever an LLM regurgitates training data, it's heralded as the coming of AGI. Whenever it's shown that they can't solve any novel problem, the research is in bad faith (but please make sure to publish the questions so that the next model version can solve them -- of course completely by chance).

    Here's a quote from the article:

    > How many humans can sit down and correctly work out a thousand Tower of Hanoi steps? There are definitely many humans who could do this. But there are also many humans who can’t. Do those humans not have the ability to reason? Of course they do! They just don’t have the conscientiousness and patience required to correctly go through a thousand iterations of the algorithm by hand. (Footnote: I would like to sit down all the people who are smugly tweeting about this with a pen and paper and get them to produce every solution step for ten-disk Tower of Hanoi.)

    In case someone imagines that fancy recursive reasoning is necessary to solve the Towers of Hanoi, here's the algorithm to move 10 (or any even number of) disks from peg A to peg C:

    1. Move one disk from peg A to peg B or vice versa, whichever move is legal.

    2. Move one disk from peg A to peg C or vice versa, whichever move is legal.

    3. Move one disk from peg B to peg C or vice versa, whichever move is legal.

    4. Goto 1.

    Second-graders can follow that, if motivated enough.

    There's now constant, nonstop, obnoxious shouting on every channel about how these AI models have solved the Turing test (one wonders just how stupid these "evaluators" were), are at the level of junior devs (LOL), and actually already have "PhD level" reasoning capabilities.

    I don't know who is supposed to be fooled -- we have access to these things, we can try them. One can easily knock out any latest version of GPT-PhD-level-model-of-the-week with a trivial question. Nothing fundamentally changed about that since GPT-2.

    The hype and the observable reality are now so far apart that one really has to wonder: Are people this easily gullible? Or do so many people in tech benefit from the hype train that they don't want to rain on the parade?

  • [flagged]

  • To paraphrase GOB Bluth:

    "Illusions, Michael! Thinking is something a whore does for money!"

    ...slow pan to shocked group of staring children...

    "..or cocaine!"

  • Someone please reply with the title "The illusion of The illusion of The illusion of Thinking".

  • I stumbled on the paper the article talks about on /r/LocalLlama (this post: https://www.reddit.com/r/LocalLLaMA/comments/1l6ibwg/when_yo...)

    I found this comment to be relevant: "Keep in mind this whitepaper is really just Apple circling the wagons because they have dick for proprietary AI tech."

    When you question the source, it really does raise eyebrows, especially as an Apple shareholder: that these Apple employees are busy not working on their own AI programme that's now insanely far behind other big tech companies, but are instead spending their time casting shade on the reasoning models developed at other AI labs.

    What's the motivation here, really? The paper itself isn't particularly insightful or ground-breaking.