Ask HN: DeepSeek V3's AI Code Review Performance – A Reality Check with Data

  • >Evaluation: Results were assessed by Claude 3.5 Sonnet V2 for consistency

    why not doing human assessment on top... to ensure the assessment by Claude is correct?

    >conducted a detailed benchmark

    i suggest you post a sample for other to try to reproduce