LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21

  • Some very odd choices in that first plot. Lower is better, but also the x-axis is inverted such that higher scores go towards the left.