Ask HN: Local LLM Recommendation?

  • This code based model leaderboard changes somewhat often: https://huggingface.co/spaces/bigcode/bigcode-models-leaderb...

    You will probably have to quantize the model though. Codellama-> https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf

    ——————- This is the main leaderboard, changes almost daily, and breaks sometimes. https://huggingface.co/spaces/bigcode/bigcode-models-leaderb...

    Some models are trained specifically for something and coding then some are not. Gotta read the model card / dataset.

    According to some other thread on Samantha, I think this one may be trained on coding also but you may have to test it or find the thread on Samantha ver 1.2. https://huggingface.co/ehartford/samantha-1.2-mistral-7b

    Without gpu memory info, idk which would be best for tokens/sec. You want to offload as much as you can to the gpu. 10-15 tokens/sec is okay. Less than that gets annoying. I would start with a 7b model then move to 11b or 13b once that is up and running.

  • https://ollama.ai

    Set up 7b on M1 Air and works great. Works on linux also but not Windows.(yet)

  • Vicuna v1.5 13B q8 runs locally using a 3060ti 8GB VRAM card on Win 10, dual Xeon (16 core) with 128GB RAM. I use LM Studio (mac, win & linux) which is super easy to install and it has a local inference server that you can connect clients to using an openai style api. A very fun project so far...

  • Run llama2 13b locally