TinyGSM: Achieving >80% on GSM8k with small language models

  • "... we find that a duo of a 1.3B generation model and a 1.3B verifier model can achieve 81.5% accuracy, outperforming existing models that are orders of magnitude larger."