Hacker News

An Empirical Study of Mamba-Based Language Models

by panabeeon 6/13/2024, 5:57:10 PM with 3 comments

by jiggawattson 6/13/2024, 10:20:23 PM
What’s the largest Mamba model that has been trained so far?
Seems like it scales better than transformers, but this would only be really obvious at parameter counts far in excess of the experiments in this paper.