Hacker News

Ask HN: DeepSeek V3's AI Code Review Performance – A Reality Check with Data

by Jet_Xuon 12/30/2024, 7:05:43 AM with 1 comment

by homarpon 12/30/2024, 8:04:00 AM
>Evaluation: Results were assessed by Claude 3.5 Sonnet V2 for consistency
why not doing human assessment on top... to ensure the assessment by Claude is correct?
>conducted a detailed benchmark
i suggest you post a sample for other to try to reproduce