All You Need Is 4x 4090 GPUs to Train Your Own Model

  • This is a great build, thanks for sharing your learnings.

    The best build I have seen so far had 6x4090's. Video: https://www.youtube.com/watch?v=C548PLVwjHA

      Specifications
      - GPU Accelerator - 6 x 24GB NVIDIA GeForce RTX 4090
      - Processor - Intel Xeon W7-3465X, 28C/56T, 2.5GHz - 4.8GHz
      - Memory - 256GB (8x32GB) DDR5 ECC 4800MHz
      - System Drive  - 2TB Samsung 980 PRO NVMe PCIe 4.0 M.2 SSD
      - Storage Drive - 4TB Samsung 870 EVO SSD
      - Operating System - Ubuntu 20.04
    
    An interesting choice to go with 256GB of DDR5 ECC; if spending so much on the 6x4090's, might as well try to hit 1 TB of RAM as well.

    The cost of this... not even sure. Astronomical.

  • This article was written or rewritten via your model right?

    The last paragraphs fell totally like AI.

    Anyway I'd like a follow up on the curating, cleaning and training part which is far more interesting than how to select hardware which we've been doing for over 25 years.

  • I would be much more intrested in a piece on what you can train with this kind of rig, rather than the rig itself

  • Hey HN I am sharing my experience on how i pretrained my own LLM by building a ML rig at home

  • On a tangent, if I wished to fine-tune one of those medium sized models like Gemma2 9B or Llama 3.2 Vision 11B, what kind of hardware would I need and how would I go about it?

    I see a lot of guides but most focus on getting the toolchain up and running, and not much talk about what kind of dataset do I need to do a good fine tuning.

    Any pointers appreciated.

  • nice writeup, but i feel that for most people, the software side of training models should be more interesting and accessible.

    for one, "full" gpu utilization, one or many, remains an open topic in training workflows. spending efforts towards that, while renting from cloud, is a more accessible and fruitful to me than to finetune for marginal improvements.

    this course was a nice source of inspiration - https://efficientml.ai/ - and i highly recommend looking into this to see what to do next with whatever hardware you have to work with.

  • Let's talk riser cables. I keep encountering issues with riser connectors claiming to support PCIe 4.0, which seem to have sub-par performance. They work fine with the GPUs and NICs I tested them with, but attaching a nvme drive causes all kinds of issues and prevents the machine from booting. I guess nvme isn't as tolerant of elevated bit-error-rates.

    That just doesn't inspire a lot of confidence in those risers, so now I'm contemplating mcio risers.

  • I'd love to read something you wrote, not something you had an AI model write for you.

  • Fun for a wealthy hobbyist, but if you want to do real work, you’re better off renting from Runpod. Good blog though.

  • All you need is a 4x 4090 GPUs and a dedicated 30 amp circuit.

  • Why not 3090s? Same VRAM and cheaper. With both setups you'd be limited to 1B. By contrast, you can run 4-bit quants of Llama 70B on two {3,4}090s, and it's still pretty lobotomized by modern standards.

    You can also train your own model even without GPUs. Just depends on parameter size.

  • Thanks for sharing. Have you prodded the model with various inputs and written an article that show various output examples? I'd love to get an idea of what sort of "end product" 4x4090s is capable of producing.

  • Wouldn’t a cluster of M4 minis cost less and provide more VRAM? There are posts about people getting decent performance for a lot less than 12k USD.

  • You can get 4060 ti 16GB cards for ~$450 or 4070 ti 16gb for ~850 instead of the $2.5k for a 4090. I wonder how well 4 of those cards would perform. The 4060 TDP is 165w instead of 450w for the 4090. The 4070 looks like the best tradeoff though for cost/power/etc though. You could probably set up an 8 card 4070 ti 16gb system for less than the 4 card 4090 system

  • Couldn't you do better with 2x AGX Orin 64gb?

  • It's probably better to hold out for the 5090 at this point, it's coming very soon as is expected to have 32GB of VRAM.

  • Anyone care to publish AMD training/inference benchmarks using ROCm? They’re hard to find.

  • Can someone definitively say for sure that I can just use two independent PSUs? One for GPUs and one for GPUs and motherboard and SATA? No additional hardware?

  • Is anyone else concerned with the power usage of recent AI? Computational efficiency doesn't seem to be a strong point... And for what benefit? IMO the usefulness payoff is too low

  • Interesting that DLSS 3 is mentioned as an advantage?

  • I’d love to hear the dev story of H100 , it seemed to come out of left field !

  • Where exactly do you plug in this beast?

  • "This needs 30 AMP circuit..." lol

  • All you need is 4x 4090 GPUs to Train Your Own Model -- and $12000 to buy them

  • Yeah, it's powerful, but can it run crysis?