I recently ported this to Metal for Apple Silicon computers. If you're interested in learning GPU programming on an M series Mac, I think this is a very accessible option. Thanks to Sasha for making this!
I think this course is also relevant for some deeper context.
https://gfxcourses.stanford.edu/cs149/fall23/lecture/datapar...
When working on GPU code there’s really two parts to it, I feel. One is “how do I even write code for the GPU” which this tutorial seems to cover but there’s a second part which is “how do I write good code for the GPU” which seems like it would need another resource or expansion to this one.
It would be nice if the puzzles natively supported C++ CUDA.
I loved the tensor puzzles you made. I spent the morning revisiting and liking all the videos on youtube you've made. Hope for many more in the future!
Either puzzle 4 has a bug in it or I'm losing my mind. (Possible answer to solution below, so don't read if you want to go in fresh)
# FILL ME IN (roughly 2 lines)
if local_i < size and local_j < size:
out[local_i][local_j] = a[local_i][local_j] + 10
Results in a failed assertion: AssertionError: Wrong number of indices
But the test cell beneath it will still pass?So I'm used to working with lists and maps, which doesn't really track well with tackling problems on thousands of cores.
Is the usual strategy to worry less about repeating calculations and just use brute force to tackle the problem?
Is there a good resource to read about how to tackle problems in an extremely parallel way?
Wow, It looks realy interesting, I will definitely look into it.
Can I hire you to make Flash Attention a reality for V100?
Looks nice and fun but the "see-through" font for the titles in the screenshots gives me some deep and primordial unease, not sure why.
seems like an opportune moment to gift a plug for bitcoin puzzles, namely BTC32 / 1000 BTC Challenge[1]
pools are in dire need of cuda developers
I made these a couple of years ago as a teaching exercise for https://minitorch.github.io/. At the time the resources for doing anything on GPUs were pretty sparse and the NVidia docs were quite challenging.
These days there are great resources for going deep on this topic. The CUDA-mode org is particularly great, both their video series and PMPP reading groups.