Theoretical limitations of multi-layer Transformer

  •   > ...our results give: ... (3) a provable advantage of chain-of-thought, exhibiting a task that becomes exponentially easier with chain-of-thought.
    
    It would be good to also prove that there is no task that becomes exponentially harder with chain-of-thought.

  • Loosely related thought: A year ago, there was a lot of talk about the Mamba SSM architecture replacing transformers. Apparently that didn't happen so far.

  • Quanta magazine has an article that explains in plain words what the researchers were trying to do : https://www.quantamagazine.org/chatbot-software-begins-to-fa...

  • those lemmas are wild

  • Huh. I just skimmed this and quickly concluded that it's definitely not light reading.

    It sure looks and smells like good work, so I've added it to my reading list.

    Nowadays I feel like my reading list is growing faster than I can go through it.