I made 4000 agent calls in Cursor last month. Each model has a personality

  • I really like the code that Gemini 2.5 Pro writes but it tends to stop for no reason and needs to be reprompted to start again. I'm not sure why this is. Also, what's the difference between 2.5 Pro and 2.5 Pro Max? Or Claude 3.7 and 3.7 Max?

    Aside: it would be good for Cursor to add something to tell their agents not to run tool calls that run forever (like test watchers). I add this in my .mdc files but I think it would be a good default so that it can run tests, update the code, run them again until it works.

  • Sonnet 3.5 has a very different personality. It's less skilled, but often I opt for it because of the personality.

    Deepseek is actually pretty good and underappreciated too. It feels unreliable though. Downside is tool use, but I prefer it over o3.