Hacker News

GPT-4-turbo preliminary benchmark results on code-editing

by heliophobicdudeon 11/7/2023, 11:14:09 PM with 10 comments

by exo-pla-neton 11/8/2023, 12:02:55 AM
So it appears that GPT-4-Turbo is indeed (at least marginally) smarter than the previous GPT-4, just as Altman claimed. Also, it's faster and cheaper, with a massive context window. Exciting!
by jpduson 11/8/2023, 12:54:48 AM
For other (non-code) benchmarks, people are having the opposite experience:
"I benchmarked on SAT reading, which is a nice human reference for reasoning ability. Took 3 sections (67 questions) from an official 2008-2009 test (2400 scale) and got the following results, here a SAT-like test:
- GPT3.5 - 690 (10 wrong) - GPT4 - 770 (3 wrong) - GPT4-turbo (one section at time) - 740 (5 wrong) - GPT4-turbo (3 sections at once, 9K tokens) - 730 (6 wrong)"
Source: https://twitter.com/wangzjeff/status/1721934560919994823?t=P...
by xeckron 11/7/2023, 11:57:40 PM
Back in April it would only generate a handful of tokens per second. The speed improvements for GPT-4 are staggering. I wonder how much of it is because Microsoft is making GPUs rain on OpenAI, and how much of it is due to improvements to the model and its scaffolding.
by meiralealon 11/8/2023, 12:53:35 AM
The past days ChatGPT went from a great pair programming helper to a useless antipathetic intern, the quality of generated code dropped visibly. The context seems to be bigger in the chatgpt plus version too but it got dumber.
by cloudkingon 11/8/2023, 12:04:50 AM
Has anyone been able to access the 128k context window? I'm not seeing that option in the API playground
by Racing0461on 11/8/2023, 12:34:24 AM
reddit thread on the opposite experience - https://www.reddit.com/r/ChatGPT/comments/17prwlg/gpt4_turbo...
by ttulon 11/8/2023, 12:03:29 AM
The progress here is remarkable. A year ago, we didn’t even have ChatGPT. LLM completions were cool but so hard to use and definitely there was nothing accessible to non-nerds.
by kristianpon 11/8/2023, 3:41:52 AM
Aider sounds like a cool tool, I'll have to try it out. I'm assuming it makes use of your local files and edits them for you?
Are there any other programming assistant packages that use the chatgpt api like this?
Regarding rate limits, it might be an idea to have configurable delays built in to the testing code to prevent hitting limits.
by Racing0461on 11/8/2023, 12:29:12 AM
Is this just the api or does it work on chatgpt also?
by vouaobrasilon 11/8/2023, 1:02:19 AM
Programmers here seem excited about the potential of this new version...but I can't help but wonder at how naive this attitude really is. Even if AI never becomes intelligent like us, if it can emulate this intelligence in enough domains, then it has a serious chance of being dangerous. It's already pretty much guaranteed that it will put almost everyone out of a job, turning the vast majority of humans into content-consuming sloths.
Does it really make sense to play with this kind of power?