I wonder whether they implemented the GRPO correction from this paper, which fixes overly long response lengths: https://arxiv.org/abs/2503.20783
I guess probably not, as they don't mention it.
I wonder whether they implemented the GRPO correction from this paper, which fixes overly long response lengths: https://arxiv.org/abs/2503.20783
I guess probably not, as they don't mention it.