Hacker News

Ask HN: Cheaper or similar setup like Asus ROG G16 for local LLM development?

by hedgehog0on 3/9/2024, 12:22:44 PM with 1 comment

by FlyingAvataron 3/10/2024, 1:46:30 AM
So running models on a M1 Mac with 32gb+ works very well as the CPU and GPU RAM are shared so you can run some really significant models with with.
Earlier this year, I also went down the path of looking into building a machine with dual 3090s. Doing it for <$1,000 is fairly challenging once you add case, motherboard, CPU, RAM, etc.
What I ended up doing was getting a used rackmount server that is capable of handling dual GPUs and two nVidia Telsa P40s.
Examples: https://www.ebay.com/itm/284514545745?itmmeta=01HRJZX097EGBP... https://www.ebay.com/itm/145655400112?itmmeta=01HRJZXK512Y3N...
The total here was ~$600 and there was essentially no effort building / assembling the machine, except I needed to order some molex power adapters, which were cheap.
The server is definitely compact, but it can get LOUD when it's running heavy load, so that might be a consideration.
It's probably not the right machine for training models, but it runs inference on GGUF (using ollama) quite well. I have been running Mixtral at zippy token rates and smaller models even faster.