Hacker News

All You Need Is 4x 4090 GPUs to Train Your Own Model

by sabareeshon 12/28/2024, 10:37:36 PM with 25 comments

by gzer0on 12/28/2024, 11:48:21 PM
This is a great build, thanks for sharing your learnings.
The best build I have seen so far had 6x4090's. Video: https://www.youtube.com/watch?v=C548PLVwjHA
```
  Specifications
  - GPU Accelerator - 6 x 24GB NVIDIA GeForce RTX 4090
  - Processor - Intel Xeon W7-3465X, 28C/56T, 2.5GHz - 4.8GHz
  - Memory - 256GB (8x32GB) DDR5 ECC 4800MHz
  - System Drive  - 2TB Samsung 980 PRO NVMe PCIe 4.0 M.2 SSD
  - Storage Drive - 4TB Samsung 870 EVO SSD
  - Operating System - Ubuntu 20.04
```
An interesting choice to go with 256GB of DDR5 ECC; if spending so much on the 6x4090's, might as well try to hit 1 TB of RAM as well.
The cost of this... not even sure. Astronomical.
by keyleon 12/28/2024, 11:19:46 PM
This article was written or rewritten via your model right?
The last paragraphs fell totally like AI.
Anyway I'd like a follow up on the curating, cleaning and training part which is far more interesting than how to select hardware which we've been doing for over 25 years.
by _just7_on 12/28/2024, 11:11:29 PM
I would be much more intrested in a piece on what you can train with this kind of rig, rather than the rig itself
by sabareeshon 12/28/2024, 10:37:36 PM
Hey HN I am sharing my experience on how i pretrained my own LLM by building a ML rig at home
by magicalhippoon 12/29/2024, 5:13:29 AM
On a tangent, if I wished to fine-tune one of those medium sized models like Gemma2 9B or Llama 3.2 Vision 11B, what kind of hardware would I need and how would I go about it?
I see a lot of guides but most focus on getting the toolchain up and running, and not much talk about what kind of dataset do I need to do a good fine tuning.
Any pointers appreciated.
by rldjbpinon 12/29/2024, 12:04:14 PM
nice writeup, but i feel that for most people, the software side of training models should be more interesting and accessible.
for one, "full" gpu utilization, one or many, remains an open topic in training workflows. spending efforts towards that, while renting from cloud, is a more accessible and fruitful to me than to finetune for marginal improvements.
this course was a nice source of inspiration - https://efficientml.ai/ - and i highly recommend looking into this to see what to do next with whatever hardware you have to work with.
by KeplerBoyon 12/29/2024, 12:17:22 AM
Let's talk riser cables. I keep encountering issues with riser connectors claiming to support PCIe 4.0, which seem to have sub-par performance. They work fine with the GPUs and NICs I tested them with, but attaching a nvme drive causes all kinds of issues and prevents the machine from booting. I guess nvme isn't as tolerant of elevated bit-error-rates.
That just doesn't inspire a lot of confidence in those risers, so now I'm contemplating mcio risers.
by xenaon 12/29/2024, 4:25:33 AM
I'd love to read something you wrote, not something you had an AI model write for you.
by abc-1on 12/28/2024, 11:25:59 PM
Fun for a wealthy hobbyist, but if you want to do real work, you’re better off renting from Runpod. Good blog though.
by bb88on 12/28/2024, 11:19:00 PM
All you need is a 4x 4090 GPUs and a dedicated 30 amp circuit.
by halyconWayson 12/29/2024, 12:08:53 AM
Why not 3090s? Same VRAM and cheaper. With both setups you'd be limited to 1B. By contrast, you can run 4-bit quants of Llama 70B on two {3,4}090s, and it's still pretty lobotomized by modern standards.
You can also train your own model even without GPUs. Just depends on parameter size.
by anonytraryon 12/29/2024, 12:13:17 AM
Thanks for sharing. Have you prodded the model with various inputs and written an article that show various output examples? I'd love to get an idea of what sort of "end product" 4x4090s is capable of producing.
by NKosmatoson 12/29/2024, 12:12:49 AM
Wouldn’t a cluster of M4 minis cost less and provide more VRAM? There are posts about people getting decent performance for a lot less than 12k USD.
by jmward01on 12/29/2024, 12:40:44 AM
You can get 4060 ti 16GB cards for ~$450 or 4070 ti 16gb for ~850 instead of the $2.5k for a 4090. I wonder how well 4 of those cards would perform. The 4060 TDP is 165w instead of 450w for the 4090. The 4070 looks like the best tradeoff though for cost/power/etc though. You could probably set up an 8 card 4070 ti 16gb system for less than the 4 card 4090 system
by AnarchismIsCoolon 12/29/2024, 3:40:18 AM
Couldn't you do better with 2x AGX Orin 64gb?
by jsheardon 12/28/2024, 11:20:18 PM
It's probably better to hold out for the 5090 at this point, it's coming very soon as is expected to have 32GB of VRAM.
by Bancakeson 12/29/2024, 3:11:03 AM
Anyone care to publish AMD training/inference benchmarks using ROCm? They’re hard to find.
by nitredon 12/31/2024, 12:34:05 PM
Can someone definitively say for sure that I can just use two independent PSUs? One for GPUs and one for GPUs and motherboard and SATA? No additional hardware?
by mcdeltaton 12/29/2024, 2:14:49 PM
Is anyone else concerned with the power usage of recent AI? Computational efficiency doesn't seem to be a strong point... And for what benefit? IMO the usefulness payoff is too low
by JacksonDamon 12/28/2024, 11:26:14 PM
Interesting that DLSS 3 is mentioned as an advantage?
by 486sx33on 12/29/2024, 1:23:42 AM
I’d love to hear the dev story of H100 , it seemed to come out of left field !
by paxyson 12/29/2024, 3:27:37 AM
Where exactly do you plug in this beast?
by m463on 12/29/2024, 1:20:56 PM
"This needs 30 AMP circuit..." lol
by master_crabon 12/28/2024, 11:15:49 PM
All you need is 4x 4090 GPUs to Train Your Own Model -- and $12000 to buy them
by patagonianboyon 12/29/2024, 12:13:49 AM
Yeah, it's powerful, but can it run crysis?