GPT-4-turbo preliminary benchmark results on code-editing

  • So it appears that GPT-4-Turbo is indeed (at least marginally) smarter than the previous GPT-4, just as Altman claimed. Also, it's faster and cheaper, with a massive context window. Exciting!

  • For other (non-code) benchmarks, people are having the opposite experience:

    "I benchmarked on SAT reading, which is a nice human reference for reasoning ability. Took 3 sections (67 questions) from an official 2008-2009 test (2400 scale) and got the following results, here a SAT-like test:

    - GPT3.5 - 690 (10 wrong) - GPT4 - 770 (3 wrong) - GPT4-turbo (one section at time) - 740 (5 wrong) - GPT4-turbo (3 sections at once, 9K tokens) - 730 (6 wrong)"

    Source: https://twitter.com/wangzjeff/status/1721934560919994823?t=P...

  • Back in April it would only generate a handful of tokens per second. The speed improvements for GPT-4 are staggering. I wonder how much of it is because Microsoft is making GPUs rain on OpenAI, and how much of it is due to improvements to the model and its scaffolding.

  • The past days ChatGPT went from a great pair programming helper to a useless antipathetic intern, the quality of generated code dropped visibly. The context seems to be bigger in the chatgpt plus version too but it got dumber.

  • Has anyone been able to access the 128k context window? I'm not seeing that option in the API playground

  • reddit thread on the opposite experience - https://www.reddit.com/r/ChatGPT/comments/17prwlg/gpt4_turbo...

  • The progress here is remarkable. A year ago, we didn’t even have ChatGPT. LLM completions were cool but so hard to use and definitely there was nothing accessible to non-nerds.

  • Aider sounds like a cool tool, I'll have to try it out. I'm assuming it makes use of your local files and edits them for you?

    Are there any other programming assistant packages that use the chatgpt api like this?

    Regarding rate limits, it might be an idea to have configurable delays built in to the testing code to prevent hitting limits.

  • Is this just the api or does it work on chatgpt also?

  • Programmers here seem excited about the potential of this new version...but I can't help but wonder at how naive this attitude really is. Even if AI never becomes intelligent like us, if it can emulate this intelligence in enough domains, then it has a serious chance of being dangerous. It's already pretty much guaranteed that it will put almost everyone out of a job, turning the vast majority of humans into content-consuming sloths.

    Does it really make sense to play with this kind of power?