Hacker News

R-Zero: Self-Evolving Reasoning LLM from Zero Data

by lawrenceyanon 9/10/2025, 2:02:17 AM with 10 comments

by Ivon 9/10/2025, 3:01:53 PM
"Starting from a single base LLM"
Ok, zero data, except the data used in the teacher model.
by jasonjmcgheeon 9/10/2025, 5:04:48 AM
Conceptually, it's effectively a GAN
by thomon 9/10/2025, 7:59:38 AM
For values of zero quite far above zero.
by nakamoto_damacyon 9/10/2025, 10:24:39 AM
Perpetual Motion Machines were a thing at some point, too.
by clbrmbron 9/10/2025, 11:21:13 AM
Terrible choice of name. DeepSeek developed a historically important model called “R-Zero” (this was the predecessor to R1 that was training without any coldstart SFT, and was very strong but difficult to read chain of thought because it code switches into Chinese and has no line breaks).
by Davidzhengon 9/10/2025, 3:16:36 PM
I think in formal domain like lean it should actually be possible to do it from zero--but seems like no major successes no far
by lawlessoneon 9/10/2025, 5:11:54 PM
OK but how do you ensure it's improving in a direction that aligns with reality?
by freejazzon 9/10/2025, 3:51:22 PM
I still don't understand what a "reasoning" LLM is
by neuroelectronon 9/10/2025, 3:00:52 PM
Now gamify it.
by cyberge99on 9/10/2025, 4:35:29 AM
What could go wrong?