Hacker News

Reinforcement Learning for Reasoning in LLMs with One Training Example

by delducaon 5/3/2025, 7:59:06 PM with 1 comment

by krackerson 5/10/2025, 10:19:12 PM
Turns out that https://www.fortressofdoors.com/four-magic-words/ was right and all you need to do in training is have the LLM meditate on a single example.