GPT-4-Turbo and GPT-4-O benchmarks released! They do well compared to the marketplace

Hi all,
I’m happy to say that the benchmarks on the gpt-4-turbo and gpt-4-o models were finally released by OpenAI and they both do pretty well.

openai/simple-evals (github.com)

Additionally, we have results on the LMSYS leaderboard now for subjective preferences:

https://leaderboard.lmsys.org/

2 Likes

Will it have a free version of gpt-4-o? Or is it only for people who pay?

If you’re going to use it through the API of course it’ll be paid, as all products.

But, according to the announcement, we should get the free version in ChatGPT (new domain, yay!) in the next days.

2 Likes

The first question I posed to ChatGPT-4o was answered quite quickly. Thereafter, though, it got slower and slower until now I have to go away and do something else while I am waiting. This whole answer was typed in while waiting and I’m still waiting.
Update: Waited for a long time. Still waiting…
Stopped and started it again. System is not very responsive.

It’s possible that they are experiencing load balancing issues while they roll out the deployments of this new model as traffic increases.

It’ll be free at limited usage in chatgpt, but in the api, it’s paid, though it’s cheaper than the last model.

Well, thanks for the ‘heads up’. I get that things don’t always go smoothly. I am back here again while I am waiting on ChatGPT-4o to complete. It’s been a while and it is stuck with a symbol that seems to indicate it believes it is continuing to work, but the output remains stuck where it was about ten minutes ago. I am on the paid plan, though it shouldn’t make a difference I would think as to fundamental ‘workingness’. If it’s a load issue, then the system should indicate that there will be a delay. Also, since it seems entirely with no hope of coming back, I am stuck finally stopping and trying again. From my experience thus far I have to start a new session.
Given that there are issues here, someone should tweak the interface so that it gives some ongoing indication of its status as it works so at least if it’s frozen it is clear it is no longer updating and that it should be stopped and tried again. Just checked and it is frozen, so trying again. Very slowly chugged away and then stopped again. Of the AIs that I have been using in recent months (Poe, Gemini, Claude, GPT-4, Groq etc.) this is the slowest thus far. There must be thousands of programmers like myself using this system. Perhaps you should reach out and ask for assistance?
image

Update: I was unable to get ChatGPT-4o to complete the thing I was doing in Chrome. When I looked at the developer tools there were a variety of errors, most seemingly due to issues with cross-site scripting. Rather than trouble-shoot that, I opened with FireFox and got it working again, albeit at a glacial pace (was still working when I was doing this update).
There were a variety of errors, the most worrisome performance-wise is attempts to read files from other sites for things like fonts which should not be needed at all. Communications is one of the very slowest and flakiest things in computing and should be avoided whenever possible. One of the main reasons that Groq is so fast is that they designed out most of the communications precisely to increase performance.
Anyway, all’s well the ends well. I got the one thing to finally finish.