Hacker News

Local LLM's to run on old iMac / Hardware

by umtksaon 11/11/2023, 2:23:32 PM with 2 comments

by smoldesuon 11/11/2023, 2:33:48 PM
Your hardware should be fine for inferencing, as long as you don't bother trying to get the GPU working.
My $0.02 would be to try getting LocalAI running on your machine with OpenCL/CLBlas acceleration for your CPU. If you're running other things, you could limit the inferencing process to 2 or 3 threads. That should get it working; I've been able to inference even 13b models on cheap Rockchip SOCs. Your CPU should be fine, even if it's a little outdated.
LocalAI: https://github.com/mudler/LocalAI
Some decent models to start with:
TinyLlama (extremely small/fast): https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGU...
Dolphin Mistral (larger size, better responses: https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF
by wannabeKazakhon 11/12/2023, 1:13:16 PM
when you say run what actually do you mean?
if you can install transformers to your imac, the easiest way is to just check in code real quick, so just run the transformers code to load the model and tokenizer. Start with some basic bert models and move up. if you try to use a model loader app esp with full interface app like textgen or lm studio, you're adding overhead. ollama model server/loader is quite fast to me likely because it only comes with cli interface out of the box but it doesn't support all models and the smallest model is a quantized orca-mini which is unlikely to work for you.