How to Build a Better AI Benchmark