Deep language algorithms predict semantic comprehension from brain activity

  • Brain scans, small sample size, sensational claims; when will we learn?

  • > First, we evaluate, for each voxel, subject and narrative independently, whether the fMRI responses can be predicted from a linear combination of GPT-2’s activations (Fig. 1A). We summarize the precision of this mapping with a brain score M: i.e. the correlation between the true fMRI responses and the fMRI responses linearly predicted, with cross-validation, from GPT-2’s responses to the same narratives (cf. Methods).

    Was this cross checked against arbitrary Inputs to GPT-2? I gather, with 1.5 Billion parameters, you can find a representative linear combination for everything.

    The Bible Code comes to mind (https://en.wikipedia.org/wiki/Bible_code).

  • > To this end, we analyze 101 subjects recorded with functional Magnetic Resonance Imaging while listening to 70 min of short stories. We then fit a linear mapping model to predict brain activity from GPT-2’s activations. Finally, we show that this mapping reliably correlates (R=0.50,p<10−15) with subjects’ comprehension scores as assessed for each story.

    Note that this is exactly the wrong way to form and attempt to refute a scientific hypothesis. The authors don't start with some new observations that require explanation, they start with a hypothesis already fully-formed ("...these models encode information that relates to human comprehension..."), and then go out and collect observations to confirm this hypothesis.

    I'm sure that if asked, the authors would say that they are simply trying to answer a scientific question, but it's obvious that they already have the answer they want and they're just trying to find data to support it. The problem of course is that if one is already convinced of the answer, one can always find evidence to "prove" it. It's a kind of confirmation bias.

  • Maybe I'm missing something, but I don't see how this correlation shows that the mapping is semantic and not, say, grammatical, syntactic, or structural in some way.

  • This is so bizarr, I wouldn't even know how to inspect, verify and critique the claims made. Also there is no subject in such simple algorithms, very relaxed use of big words.

  • Ooh, more mind reading machines in the age of total privacy loss, nice.

  • How long before a computer can see kernels of thought before we perceive them in our mind?

  • I'm missing how this is an advance over https://www.pnas.org/doi/10.1073/pnas.2105646118

  • 101 subjects seems like not that much data to establish a correlation between GPT activations and brain activations, a correlation between brain activations and reading comprehension, and then chain them together to get an overall correlation.

  • Pretty wild finding.

    If it holds up, you could monitor kids in class, dementia patients.. wild. Start your startup engines.

    Extrapolating, this also suggests neurolink should work and you can probably do it with less invasive tech.

  • Ah, this is truly beautiful - neuropsychology and artificial neural networks making connections.

    From the paper: ”These advances raise a major question: do these algorithms process language like the human brain? Recent studies suggest that they partially do: the hidden representations of various deep neural networks have shown to linearly predict single-sample fMRI, MEG, and intracranial responses to spoken and written texts.”

  • Nature Scientific Reports has very light peer-review. If you pay the fee, they publish with high probability.

    >Manuscripts are not assessed based on their perceived importance, significance or impact https://www.nature.com/srep/guide-to-referees#criteria

  • Should be repeated with other LLMs for confirmation.

  • From the paper: ”Specifically, we show that GPT-2’s mapping correlates with comprehension up to R=0.50. This result is both promising and limited: on the one hand, we reveal that the similarity between deep nets and the brain non-trivially relates to a high-level cognitive process. On the other hand, half of the comprehension variability remains unexplained by this algorithm.”

  • Here, we show that the representations of GPT-2 not only map onto the brain responses to spoken stories, but they also predict the extent to which subjects understand the corresponding narratives. To this end, we analyze 101 subjects recorded with functional Magnetic Resonance Imaging while listening to 70 min of short stories. We then fit a linear mapping model to predict brain activity from GPT-2’s activations. Finally, we show that this mapping reliably correlates (R=0.50,p<10−15) with subjects’ comprehension scores as assessed for each story.