Compare 75 AI Models on 200 Prompts Side by Side

  • Very nice. If these are pre-computed, is it possible to make a table view that lists every prompt and the answer?

  • As per this site, only GPT-4-Turbo seems to get "What is poisonous for humans but not for dogs?". All other models look to fail at it.