The way we measure progress in AI is terrible