Hacker News

MiniMax-M1 open-weight, large-scale hybrid-attention reasoning model

by danboarderon 6/18/2025, 6:53:34 AM with 12 comments

by swyxon 6/18/2025, 8:47:37 AM
1. this is apparently MiniMax's "launch week" - they did M1 on Monday and Hailuo 2 on Tuesday (https://news.smol.ai/issues/25-06-16-chinese-models). remains to be seen if they can keep up the pace of model releases for the rest of this week - these 2 were big ones, they aren't yet known for much else beyond llm and video models. just watch https://x.com/MiniMax__AI for announcements.
2. minimax m1's tech report is worthwhile: https://github.com/MiniMax-AI/MiniMax-M1/blob/main/MiniMax_M... while they may not be the SOTA open weights model, they do make some very big/notable claims on lightning attention and their GRPO variant (CISPO).
(im unaffiliated, just sharing what ive learned so far since no comments have been made here yet
by reedlawon 6/18/2025, 11:26:40 AM
In case you're wondering what it takes to run it, the answer is 8x H200 141GB [1] which costs $250k [2].
1. https://github.com/MiniMax-AI/MiniMax-M1/issues/2#issuecomme...
2. https://www.ebay.com/itm/335830302628
by vintermannon 6/18/2025, 9:50:06 AM
"We publicly release MiniMax-M1 at this https url" in the arxiv paper, and it isn't a link to an empty repo!
I like these people already.
by noelwelshon 6/18/2025, 9:23:33 AM
A few thoughts:
* A Singapore based company, according to LinkedIn. There doesn't seem to be much of a barrier to entry to building a very good LLM.
* Open weight models + the development of Strix Halo / Ryzen AI Max makes me optimistic that running great LLMs locally will be relatively cheap in a few years.
by npteljeson 6/18/2025, 9:57:32 AM
This is stated nowhere on the official pages, but it's a Chinese company.
https://en.wikipedia.org/wiki/MiniMax_(company)
by markkittion 6/18/2025, 11:01:29 AM
Please come up with better names for these models. This sounds like the processor in my Mac Studio.
by htrpon 6/18/2025, 10:19:36 AM
They apparently building buzz for an IPO
https://www.bloomberg.com/news/articles/2025-06-18/alibaba-b...
by killerstormon 6/18/2025, 5:06:10 PM
> "In our attention design, a transformer block with softmax attention follows every seven transnormer blocks (Qin et al., 2022a) with lightning attention."
Alright, so it's 87.5% linear attention + 12.5% full attention.
TBH I find the terminology around "linear attention" rather confusing.
"Softmax attention" is an information routing mechanism: when token `k` is being computed, it can receive information from tokens 1..k, but it has to be crammed through a channel of a fixed size.
"Linear attention", on the other hand, is just a 'register bank' of a fixed size available to each layer. It's not real attention, it's attention only in the sense it's compatible with layer-at-once computation.
by b0a04glon 6/18/2025, 12:39:25 PM
if they trained this scale without western cloud infra, i'd want to know what their token throughput setup looks like
by insider123on 6/18/2025, 11:26:41 AM
[dead]