Hacker News

Sparse Llama: 70% Smaller, 3x Faster, Full Accuracy

by panabeeon 5/15/2024, 11:36:20 PM with 1 comment

by free_bipon 5/16/2024, 3:11:48 PM
Specifically this is Llama2, not Llama3, was a bit disappointed from that. Also wasn't totally clear from the article - will this actually increase GPU inference speed / decrease GPU memory usage?