Interesting, I guess that is why I never got a response back from them about buying their stuff.
My guess is that they realized that just selling hardware is a lot harder than running it themselves. Deploying this level of compute is non-trivial, with very high rates of failure, as well as huge supply chain issues. If you have to sell the hardware and support people buying it, that is a world of trouble.
> no-one wants to take the risk of buying a whole bunch of hardware
I do!
Nobody has stated it yet, but this is probably great news for tenstorrent.
Disclosure: building a cloud compute provider starting with AMD MI300x, and eventually any other high end hardware that our customers are asking for.
They're calling the lie on needing bleeding edge hardware for performance.
5 yr old silicon (14 nm!!) and no hbm.
Their secret sauce seems to be an ahead-of-time compiler that statically lays out entire computation, enabling zero contention at runtime. Basically, they stamp out all non-determinism.
Custom state-of-the-art silicon is ridiculously expensive.
For a minimum 100 wafers = 10k chips, Groq may have paid $100M = $10k/chip purely in amortizing design costs.
Chip design (software + engineer time) and fabrication setup (lithography masks) grow exponentially [1][2] with smaller nodes, e.g., maybe $100M for Groq's current 14nm chips to ~$500M for their planned 4nm tapeout. Once you reach mass production (>>1000 wafers, which have ~150 large chips each), wafers are $10k each. On top of this, it takes ~1 year to design then have prototypes made. (These same issues still exist on older slower nodes, albeit not as bad.)
This could be reduced somewhat if chip design software were cheaper and margins were lower, but maybe 20% of this cost is due to fundamental manufacturing difficulty.
(disclosure: I don't work with recent tech nodes myself; this is my best guess)
[1] https://www.semianalysis.com/p/the-dark-side-of-the-semicond... [2] https://www.extremetech.com/computing/272096-3nm-process-nod...
The smoke and mirrors around groq are finally clearing. Truth is that their system is insanely expensive to maintain. hundreds (> 500 iirc) of chips to get wild tokens/s but the power and maintenance expense is crazy high for that number of chips. TCO just isn’t worth it
Sounds like they're looking to get bought up to me. I'm sure they could monetize their current hardware, and build to sell just like other niche hardware vendors. Anyone remember the hype around big "cloud" storage boxes 10 years back?
I'm not able to get consistent replies from the API. It's lightening fast for like ten minutes and then starts freezing up for several seconds.
I want to use it, but it's been very unreliable. I have been using Claude 3 and thinking about together.ai with Mixtral.
This business model is bound to get attacked and suffer a painful exit soon. Here's why:
First, the whole systems of chips architecture that everyone is talking about will solve for increasing overall SRAM available to keep more model state on super fast memory and avoid going to slow memory.
Secondly, anyone serious about their data (enterprises) won't be okay with making API calls to Groq. Anyone serious about their data and have a lot scale (consumer internet) won't also be okay with making expensive API calls to Groq at scale.
Their cloud is attractive only if I can use their API for experimentation toy apps to continue developing in this direction while the rest of the major industry players systems of chip architecture catches up and solves for SRAM size bottleneck and manufacturing process bottleneck, and once that's solved, I get more powerful compute for cheaper $$ to deploy on-prem.
So, this cloud strategy is short-lived. I see another pivot on the horizon.
Totally saw this one coming! [1]
I think one major challenge they'll face is that their architecture is incredibly fast at running the ~10-100B parameter open-source models, but starts hitting scaling issues with state-of-the-art models. They need 10k+ chips for a GPT-4-class model, but their optical interconnect only supports a few hundred chips.
[1] https://www.zach.be/p/why-is-everybody-talking-about-groq
Given that their hardware is different I can kinda see how they don’t want to deal with supporting customers.
> what do you mean I can’t just drop a CUDA docker image in?
So unless there are new Croq datacenters coming, this is only interesting for North American users. Otherwise H100 based latency optimized solutions would be faster - in particular for time-to-first-token sensitive applications.
That sucks. I wanted to save up for a couple years and get some hardware for home, but I guess the "AI" space moves so fast you barely get a couple months
Man, I want to appreciate a nice new hardware approach, but they say such BS that it is hard to read about them:
> “There might need to be a new term, because by the end of next year we’re going to deploy enough LPUs that compute-wise, it’s going to be the equivalent of all the hyperscalers combined,” he said. “We already have a non-trivial portion of that.”
Really? Does anyone seriously believe they are going to be the equivalent of all hyperscalers in compute next year? (Where Meta alone is at 1 million H100 equivalents.) In the same article where they say it's too hard for them to sell chips? And when they literally don't have a setup to even accept a credit card today?
Wildly fast inference. And current chips are 14nm so headroom to get a lot better.
IMHO, Groq is being shadow acquired by Google
> If customers come with requests for high volumes of chips for very large installations, Groq will instead propose partnering on data center deployment. Ross said that Groq has “signed a deal” with Saudi state-owned oil company Aramco, though he declined to give further details, saying only that the deal involved “a very large deployment of [Groq] LPUs.”
What? How does this make sense?
Read: We're forcing someone's hand in acquiring us.
Groq is still under a 30 request per minute rate-limit, which drops to 10 requests per minute if you have all day usage.
Billing has been "coming soon" this whole time, and while they've built out hype enabling features like function calling, somehow they can't setup a Stripe webhook to collect money for realistic rate limits.
They couldn't scream "we can't service the tiniest bit of our demand" any louder at this point.
_
Edit: For anyone looking for fast inference without the smoke and mirrors, I've been using Fireworks.ai in production and it's great. 200 tk/s - 300 tk/s is closer to Groq than it is to OpenAI and co.
And as a bonus they support PEFT with serverless pricing.
[dead]
Another casualty of AI KYC.
I don't understand why the comments are trash-talking Groq. They are the fastest LLM inference provider by a big margin. Why would they sell their hardware to any other company for any price? Keep it all for themselves and take over the market. 95% of my LLM requests go to Groq these days because it's 0.25 seconds round trip for a complete answer. In comparison, "Claude Instant" takes about 4 seconds. The other 5% of my requests go to Claude Opus and GPT-4, when I'm willing to wait an excruciating 5+ seconds for a better answer. I hate waiting. Latency is king. Groq wins.