Hacker News

Show HN: San Francisco Compute – 512 H100s at <$2/hr for research and startups

by flaqueon 7/30/2023, 5:25:24 PM with 28 comments

by sillysaurusxon 7/30/2023, 6:32:45 PM
I hope you succeed. TPU research cloud (TRC) tried this in 2019. It was how I got my start.
In 2023 you can barely get a single TPU for more than an hour. Back then you could get literally hundreds, with an s.
I believed in TRC. I thought they’d solve it by scaling, and building a whole continent of TPUs. But in the end, TPU time was cut short in favor of internal researchers — some researchers being more equal than others. And how could it be any other way? If I made a proposal today to get these H100s to train GPT to play chess, people would laugh. The world is different now.
Your project has a youthful optimism that I hope you won’t lose as you go. And in fact it might be the way to win in the long run. So whenever someone comes knocking, begging for a tiny slice of your H100s for their harebrained idea, I hope you’ll humor them. It’s the only reason I was able to become anybody.
by whackon 7/30/2023, 9:55:16 PM
> Rather than each of K startups individually buying clusters of N gpus, together we buy a cluster with NK gpus... Then we set up a job scheduler to allocate compute
In theory, this sounds almost identical to the business model behind AWS, Azure, and other cloud providers. "Instead of everyone buying a fixed amount of hardware for individual use, we'll buy a massive pool of hardware that people can time-share." Outside of cloud providers having to mark up prices to give themselves a net-margin, is there something else they are failing to do, hence creating the need for these projects?
by bnr4uon 7/30/2023, 9:59:16 PM
Having hosted infrastructure in CA at multiple colos. I would advise you to host it elsewhere if you can, cost of power, other infrastructure is much higher in CA than AZ or NV.
by wodenokotoon 7/31/2023, 6:22:40 AM
> It's just that no cloud provider in the world will give you $100k of compute for just a couple weeks
I've never had to buy very large compute, but I thought that was the whole point of the cloud
by williamsteinon 7/30/2023, 6:49:01 PM
How does this compare to https://lambdalabs.com/ ?
by whimsicalismon 7/30/2023, 7:07:43 PM
I am super interested in AI on a personal level and have been involved for a number of years.
I have never seen a GPU crunch quite like it is right now. To anyone who is interested in hobbyist ML, I highly highly recommend using vast.ai
by duduson 7/31/2023, 1:07:46 AM
I know AWS/GCP/Azure have overhead and I understand why so many companies choose to go bare metal on their ops. I personally rarely think it's worth the time and effort, but I get that with scale saving can be substantial.
But for AI training? If the public cloud isn't competitive even for bursty AI training, their margins are much higher than I anticipated.
OP mentions 10-20x cost reduction? Compared to what? AWS?
by kaycebasqueson 7/30/2023, 9:33:26 PM
Hi, SF lover [1] here. Anything interesting to note about your name? Will your hardware actually be based in SF? Any plans to start meetups or bring customers together for socializing or anything like that?
[1] We have not gone the way of the Xerces blue [2] yet... we still exist!
[2] https://en.wikipedia.org/wiki/Xerces_blue
by nilsbungeron 7/30/2023, 8:31:52 PM
I love the idea of community assets. could it be the start of a GPU co-op?
by moneycantbuyon 7/30/2023, 7:53:35 PM
How did you get the money to buy 512 H100s?
by itissidon 7/30/2023, 10:53:46 PM
Noob Thought: So this would be a blue print on how a mid tier universities with older large compute cluster ops could do things in 2023 to support large LLM research?
Perhaps its also a way for freshly applying grad students to look at a university looking to do research in LLMs that requires scale...
by latchkeyon 7/30/2023, 7:03:52 PM
554 5.7.1 <evan@sfcompute.org>: Relay access denied
554 5.7.1 <alex@sfcompute.org>: Relay access denied
by sashank_1509on 7/30/2023, 8:32:06 PM
Correct me if I’m wrong but doesn’t Lambda Labs already provide them at 1.89$? What’s the point if you’re starting out not the cheapest
by mackidon 7/31/2023, 1:05:28 AM
Nat Friedman and Daniel Gross setup a 2,512 H100 cluster [1] for their startups, with a very similar “shared” model. Might be interesting to connect with them.
[1] https://andromedacluster.com/
by metadaton 7/31/2023, 2:35:00 AM
Will it be a Slurm cluster, or what kind of scheduler is SFC planning to use?
by ucarionon 7/30/2023, 8:32:03 PM
Wishing y'all the best of luck. This would be huge for a lot of folks.
by PettingRabbitson 7/31/2023, 5:25:01 PM
What kind of hardware setup are you planning out? Colocation, roll-your-own data center, something in between? Any thoughts on what servers the GPUs will be housed in?
by netcrafton 7/31/2023, 3:52:56 AM
Honest question I don’t know how to consider: are we further along or behind with AI given crypto’s use of GPUs? Has the same cards bought for mining furthered AI, or maybe that demand lead to more research into GPUs and what they can do - or would we be further along if we weren’t wasting these cards on mining?
by orGANicWebon 8/2/2023, 2:16:47 PM
How are you going to sell access and divide the resources?
by resonance1994on 7/30/2023, 8:38:57 PM
Just curious, do you guys use renewable energy to power your cluster?
by rushingcreekon 7/31/2023, 3:02:11 AM
I love this. Us at Phind.com would love to be a part of this.
by 29athrowawayon 7/30/2023, 8:20:21 PM
During a gold rush, sell shovels.
When was the last time you spoke to a chatbot?
by rsyncon 7/30/2023, 7:12:50 PM
"Once the cluster is online ..."
Where will the cluster be hosted ?
May I suggest that you get your IP transit from he.net ?
by agajewson 7/30/2023, 6:24:37 PM
[dead]
by orGANicWebon 8/2/2023, 2:16:25 PM
[flagged]
by AndrewKemendoon 7/30/2023, 11:38:44 PM
The billion dollar question is:
Who is funding this?
Cause if it’s VC then it’s going to have the same fate as everything else after 5-7 years.
I hope y’all have as innovative of a business model. You’ll need it if you want to do what you’re doing now for more than a few years
by jeepers6on 7/31/2023, 7:13:18 AM
Please take this question without prejudice.
Is it accurate to say you’re willing to go into ~20,000,000 USD debt to sell discounted computer-as-a-service to researchers/startups, but unwilling to go into debt to sponsor the undergraduate degrees of ~100-500 students at top-tier schools? (40k - 200k USD per degree)
Or, you know, build and fund a small public school/library or two for ~5 years?