He doesn't mention yield, which I hear is currently pretty bad on 3nm. And "just route around the bad silicon" might be easier said than done; widespread defects might create too many roadblocks to efficiently move data around your wafer-chip.
If that's not factored into his 3450 wafer estimate, it could be double that.
Say what you want about him, but George is one of the few people I know that can actually go a mile wide and a mile deep into any field they choose. And he does most of his digging live on stream, so he makes an easy punching bag. Do I think he'll successfully raise a $400M round on a $2B valuation? No, something else will distract him before that - but I'm going to enjoy watching him (and pulling for him) regardless.
Brainstorming can help identify limiting assumptions.
Mostly compute has piggy-backed off consumer-scale production (e.g., GPU's repurposed for crypto).
The suggestion is that an AI model can justify few-shot chip production.
His proposal is for development, i.e., to build the model, and depends mostly on such models being qualitatively better.
It seems more likely that chips would be built to offer model processing, instead of forcing users into a service (with its risk of confidentiality and IP leaks). To get GPT-100, you'd incorporate the chip into your device -- and then know for sure that nothing could leak. That eliminates the primary transaction cost for AI compute: the risk.
Which presents the question: does anyone know of research or companies working on such chip models?
I will refer to this as the moment when LLM hype jumped the shark.
How would you distribute clock signal around that wafer-scale GPU? OR is he simply suggesting you buy the whole wafer and litho standard GPUs out of it? Apologies if this is a stupid question.
How does he cool those wafers?
How do these goals set out the needed ratios of memory to compute to interconnect bandwidth?
An ideal machine designed to train GPT4 in a day is likely very different to the ideal machine to train 50 GPT4s as once over a few weeks, which is very different from the ideal machine to train a model 100x bigger than GPT4 (perhaps the most interesting).
Not sure about that computer plan but I do enjoy his brand of entertainment.
Also really hoping he makes more progress on amd ml
Where is the budget line item for defending it against bombing by crazed anti-ai doomsday cultists?
Wyoming seems like a good place to build such a thing.
This is really cool. Godspeed!
Maybe I'm overly critical, but if fixing the Twitter search box is too hard, maybe buying solar (and batteries), fabbing chips, and building a giant data center to replicate the current generation of what another company already built is foolhardy.