Did anyone else notice the absolutely insane author lists of references 1 and 3?
I was expecting to find this 2016 article in there: https://news.ycombinator.com/item?id=12469270
This 2019 one does show up: https://news.ycombinator.com/item?id=22712811
Of course, this "out of spec" behaviour of DRAM, more specifically the ability to do copying, is also implicated in this infamous bug: https://news.ycombinator.com/item?id=5314959
It seems more than one person independently observed such a thing, and thought "this might be a useful behaviour".
> By intentionally issuing DRAM commands that violate manufacturer-specified timing parameters.. [gaining] massive parallelism up to 65,536 bitwise operations in parallel.
Take that, binary blobs for DRAM training!
This is just mind-bendingly weird and wonderfully creative. It can pay to work in the weeds! Bravo.
They're doing matrix operations in the Dram itself? That sounds insane and also fascinating
In the hardware world are there risks of taking advantage of a bug knowing that the manufacturer may someday fix the bug? I know in the software world it's a bad idea to leverage a bug in a platform to enable a feature (or fix another bug). The bug you're counting on being present may get fixed 15 years in the future and then your system explodes and no one knows why.
edit: seems like there was a recent discussion about something similar... undefined behavior in some C function iirc
>General matrix-vector multiplication (GeMV)
Ok, so my math isnt great.
When I was studying Quaternions during my 3d math class (That I failed the first time, like I said, not a math guy) they briefly covered the history of matrix calculation in graphics development.
My understanding is that Quaternions became popular because they are almost as accurate as matrices but much less complex computationally.
Has anyone tried building an LLM using Quats instead of matrices?
Or are the optimisations with Quaternions more useful in realtime?
A bit unscientific that they don't cite the original Intelligent RAM (IRAM) sources from 1997:
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=iram...
Can we expect to see matrix multiplication and perhaps other ops move from classic CPUs out into the DRAM, perhaps with deliberate hardware support?
And does such a processing shift give advantage to Samsung etc? Where does this leave NVIDIA etc?
Funny hack. Without having read the paper I'd assume the operations to be thermally unstable. So LLM inference results will vary based on environmental temperature :-)
This woule be a cool way to make a cheap inferencing device for the largest LLMs
So is this a new technique of doing computations within existing DRAM to overcome the memory wall issue of modern computing?
Some more background information:
One of the original proposals for in-DRAM compute: https://users.ece.cmu.edu/~omutlu/pub/in-DRAM-bulk-AND-OR-ie...
First demonstration with off-the-shelf parts: https://parallel.princeton.edu/papers/micro19-gao.pdf
DRAM Bender, the tool they are using to implement this: https://github.com/CMU-SAFARI/DRAM-Bender
Memory-Centric Computing: Recent Advances in Processing-in-DRAMhttps://arxiv.org/abs/2412.19275