Hacker News

Implementing DeepSeek R1's GRPO algorithm from scratch

by xcodevnon 4/13/2025, 6:33:05 PM with 1 comment

by cubefoxon 4/14/2025, 10:09:12 AM
I wonder whether they implemented the GRPO correction from this paper, which fixes overly long response lengths: https://arxiv.org/abs/2503.20783
I guess probably not, as they don't mention it.