Hacker News

Theoretical limitations of multi-layer Transformer

by fovcon 1/31/2025, 5:48:23 PM with 5 comments

by theszon 2/1/2025, 8:40:36 AM

  > ...our results give: ... (3) a provable advantage of chain-of-thought, exhibiting a task that becomes exponentially easier with chain-of-thought.

It would be good to also prove that there is no task that becomes exponentially harder with chain-of-thought.

by cubefoxon 1/31/2025, 10:04:16 PM
Loosely related thought: A year ago, there was a lot of talk about the Mamba SSM architecture replacing transformers. Apparently that didn't happen so far.
by hochstenbachon 2/2/2025, 6:28:05 AM
Quanta magazine has an article that explains in plain words what the researchers were trying to do : https://www.quantamagazine.org/chatbot-software-begins-to-fa...
by byyoung3on 2/1/2025, 9:39:47 AM
those lemmas are wild
by cs702on 1/31/2025, 8:14:06 PM
Huh. I just skimmed this and quickly concluded that it's definitely not light reading.
It sure looks and smells like good work, so I've added it to my reading list.
Nowadays I feel like my reading list is growing faster than I can go through it.