If the developer’s Christmas bonus depends on scoring high on a particular benchmark, it is not inconceivable that benchmark somehow would make its way into the training set - directly or indirectly.
I don’t think the management would have to directly encourage it as stated in the Chinese text.
VP of AI at FAIR, who is unrelated to the llama/genai team.
https://x.com/Ahmad_Al_Dahle/status/1909302532306092107
“ We've also heard claims that we trained on test sets -- that's simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations.”