Hacker News

Llama2 implementation on Mojo runs at high performance

by juliangambleon 9/11/2023, 9:36:19 PM with 1 comment

by version_fiveon 9/11/2023, 9:40:21 PM
Is there a real link somewhere? What flags was llama2.c built with for the comparison? (edit, it's build with `make runfast` which doesn't parallelize across cores... I wonder if that's part of it. I also wonder if BLAS is another reason, I assume mojo has some accelerated linear algebra library.
Llama2.c is a toy and not optimized, it's matmul is a for loop and C and it relies entirely on the compiler for speedup. You'd need to compare it with llama.cpp for anything credible.