Andromeda Cluster: 10 Exaflops* for Startups from Nat Friedman and Daniel Gross

  • Forget LinPack and friends. Jack Dongarra is going to need to switch to the new metric for supercomputers—-kilograms of H100 GPUs—- about 3,300 give or take a few grams for this system.

  • > For use by startup investments of Nat Friedman and Daniel Gross

    > Reach out if you want access

    I'm confused by the last two bullet points. Is this website only meant to be used by these "startup investments" or can anyone fill out the linked form?

  • Can the creators explain in more detail: how is this different from (for example) the OpenAI cluster that MSFT built in Azure? Is it hosted in an existing cloud provider, or in a data center? Which data center? Who admins the system, is there an SRE team in case it goes down during training? And can you attempt ot run the same benchmarks that Top500 uses to determine what your double precision flops are and give that number in addition to your "10 exaflops" (which I believe is single precision).

  • Emad from Stability estimates this at 4M/month. https://twitter.com/emostaque/status/1668666509298745344

  • lmao are they trolling with the naming

    https://www.cerebras.net/andromeda/

  • The mainframe is dead. Long love the new mainframe. We just call it DGX because it’s cool.

    Even the leasing model has made a comeback!

  • Same guys behind https://aigrant.org, maybe it's mainly as a way to get dealflow?

  • Would be very cool to see some pictures of the cluster; sounds like an impressive build!

  • Someone please start the GPT 5 training before the regulation kicks in.

  • Looks like they've reserved a bunch of compute from Lambda Labs?

    Edit: Based off this tweet, looks very similar https://twitter.com/LambdaAPI/status/1668676838044868620

  • > Big enough to train llama 65B in ~10 days

    Y'all could totally eat Meta's lunch and train an open LLM with all the innovations that have come since LLaMA's release. Other startups are trying, but they all seem bottlenecked by training time/resources.

    This could be where the next Stable Diffusion 1.5 comes from.

  • The new lean startup. 3,291 kg

  • [flagged]