fMRI-to-image with contrastive learning and diffusion priors

  • Wasn't there something similar a few months ago on HN and where the top comment talked about how it's not as impressive as it sounds [0]? The main issue is that this type of methodology is pulling from a pool of images, not literally reconstructing what image was seen in the brain directly.

    > I immediately found the results suspect, and think I have found what is actually going on. The dataset it was trained on was 2770 images, minus 982 of those used for validation. I posit that the system did not actually read any pictures from the brains, but simply overfitted all the training images into the network itself. For example, if one looks at a picture of a teddy bear, you'd get an overfitted picture of another teddy bear from the training dataset instead.

    > The best evidence for this is a picture(1) from page 6 of the paper. Look at the second row. The building generated by 'mind reading' subject 2 and 4 look strikingly similar, but not very similar to the ground truth! From manually combing through the training dataset, I found a picture of a building that does look like that, and by scaling it down and cropping it exactly in the middle, it overlays rather closely(2) on the output that was ostensibly generated for an unrelated image.

    > If so, at most they found that looking at similar subjects light up similar regions of the brain, putting Stable Diffusion on top of it serves no purpose. At worst it's entirely cherry-picked coincidences.

    > 1. https://i.imgur.com/ILCD2Mu.png

    > 2. https://i.imgur.com/ftMlGq8.png

    [0] https://news.ycombinator.com/item?id=35012981

  • This is SO COOL. I'd guess (I did analysis for an fMRI lab for a year so I'm not a pro but not totally talking out of my orifice) that detecting images like this is among the easier things you could do (it probably wouldn't be so easy to do things like "guess the words I'm thinking of") and I suspect other sensory stuff might be harder but I have little knowledge there.

    One of the biggest issues with any attempt to extract information from an fMRI scan is resolution, both spatial and temporal - this study used 1.8mm voxels which is a TON of neurons (also recall that fMRIs scan blood flow, not neuron activity - we just count on those things being correlated). Temporally, fMRI sample frequency are often <1hz. I didn't see that they mentioned a specific frequency, but they showed images to the subject for 3 seconds at a time so I'd guess that's designed to ensure you get a least a frame or three while the subject is looking at the image. You can sort of trade voxel size for sample frequency - so you can get more voxels, or more samples, but not both. So detecting things that happen quickly (like, say, moving images or speech) would probably be quite hard (even if you could design an ai thingey that could do it, getting the raw data at the resolution you'd need is not currently possible with existing scanners)

    Also, not all brain functions are as clearly localized as vision - the visual cortex areas in the back of the brain map pretty directly to certain kinds of visual stimulus, while other kinds of stimulus and activity are much less localize (there isn't a clear "lighting up" of an area). You can get better resolution if you only scan part of the brain (i.e. the visual cortex) (I don't know if that's what they did for this study), but that's obviously only useful for activity happening in a small part of the brain.

    ANYWAY SO COOL!!! I wonder if you could use this to draw people's faces with a subject who is imagining looking at a face? fMRI police sketch? How do brains even work!?

  • Human communication will change dramatically once useful invasive brain-computer interfaces are available.

    People will suddenly realize that the reason language is primarily serial is simply due to the fact that it must be conveyed by a series of sounds. There will likely be a new type of visual language used via BCI "telepathy". It may have some ordering but will not rely so heavily on serializing information, since the world is quite multidimensional.

  • For context, early vision is easier to map than you might expect.

    Here's a radiograph of the primary visual cortex created in 1982 by projecting a pattern onto a macaque's retina: https://web.archive.org/web/20100814085656im_/http://hubel.m...

    An injection of radioactive sugar lets you see where the neurons were firing away and metabolizing the sugar.

    (https://pubmed.ncbi.nlm.nih.gov/7134981/)

  • Some important points from the article under the limitations section

    - Each participant in the dataset spent up to 40 hours in the MRI machine to gather sufficient training data.

    - Models were trained separately for every participant and are not generalizable across people. Image limitations: MindEye is limited to the kinds of natural scenes used for training the model. For other image distributions, additional data collection and specialized generative models would be needed.

    - … Non-invasive neuroimaging methods like fMRI not only require participant compliance but also full concentration on following instructions during the lengthy scan process. …

  • Awesome, predicting words from fMRI has been around for a while and visual cortex can be mapped well.

    That said, and coming from a background in neuroimaging 20 years ago, what’s the applicability? MRI hasn’t gotten that much more cost effective for more widespread uses. Magnets are expensive.

  • What if a copyrighted image or video can be recovered from your brain using external tech like this? That's not fair to the rights holders, what we really need is technology that can clean such illegal memories from the brain; a brainwasher if you will

  • Aside from helping those with disabilities, what is stopping authorities using this as a lie detector? I assume the tech isn’t quite there yet.

  • Techniques like this give me hope that some day we will be able to objectively diagnose mental illness, and monitor the efficacy of treatment.

  • As someone suffering from intrusive thoughts I do not look forward to a future where other people can see what I sometimes see in my head.

  • I think the method of merging the pipelines via img2img should use controlnet. Possibly needing to be finetuned specifically for this, although existing controlnet models might work fine for this.

    This is exactly what you'd want to use controlnet for - mapping semantic information onto the perceived structure.

  • They should dose people with DMT in the fMRI and run it through the model.

  • There was also DreamDiffusion recently. https://arxiv.org/abs/2306.16934

  • This is cool . We are going towards future like shown in inception to plant an idea.

  • the best recons I've seen so far are from Jack Gallant and Alex Huth's labs (at least what was shown publicly at SfN)

  • This is mind reading, we are in the future.