Solving Math Word Problems

  • > Richard, Jerry, and Robert are going to share 60 cherries. If Robert has 30 cherries, and has 10 more than Richard, how many more cherries does Robert have than Jerry?

    > answer:

    > Robert has 30 + 10 = 40 cherries.

    > If there are 60 cherries to be shared, then Richard and Jerry will have 60 - 40 = 20 cherries each.

    > Robert has 40 - 20 = 20 more cherries than Jerry.

    Um, the answer is "correct" but isn't the actual reasoning wrong?

    Robert has 30

    Richard has 20

    Jerry has 10

    Hence they split the 60 this way.

  • This might work better if GPT3 is used to rewrite each statement into an algebraic equation. And then a equation solver is used to solve the system.

  • It’s frustrating how myopic these papers can be. It seems like the goal of the paper is to solely work within the GPT framework to test the theory of verifiers. Why not try verifiers out with other models? Perhaps it’s not a fair comparison but I remember a Kaggle competition [0] from six years ago which involved building models to solve grade school science multiple choice questions. A simple word2vec model already could achieve 50% accuracy. Despite multiple choice being (maybe?) easier than free response, I’m just skeptical that the way to solve these problems is to throw billions of weights at them. It’s also not convincing to me that this new dataset doesn’t suffer from a much smaller template space, in that the models still just memorize templates.

    [0]: https://www.kaggle.com/c/the-allen-ai-science-challenge/over...

  • For a moment there, the title had me hoping that they were working on the generally undecidable https://en.m.wikipedia.org/wiki/Word_problem_(mathematics)

  • Scoring 55% on a test like this should not be considered a great accomplishment. A sign of progress, yes, but not an accomplishment by itself.

    This is still simply a system that is good at guessing. It does not know anything.