GPT-4o-2024–08–06 slower then previous version

_j · October 15, 2024, 7:36am

I can give you a benchmark.

This particular one is with 1700+ tokens of input, a prior chat turn and long output before the question being posed. When repeated, this should invoke the context caching feature of latest gpt-4o, less AI computation, less expense, but perhaps not a guarantee of better performance. The client’s max_retries = 0, and any errors would be dropped from the report. The calls are run synchronously, alternating between models.

I already see by the progress of streaming chunk indicators, there is slowness and pauses in the output of gpt-4o-2024-08-06. Then the results:

For 5 trials of gpt-4o-2024-08-06 @ 2024-10-15 12:17AM:

Stat	Average	Cold	Minimum	Maximum
stream rate	Avg: 44.960	Cold: 36.3	Min: 36.3	Max: 48.8
latency (s)	Avg: 0.811	Cold: 1.5379	Min: 0.459	Max: 1.5379
total response (s)	Avg: 12.312	Cold: 15.6141	Min: 11.0963	Max: 15.6141
total rate	Avg: 42.244	Cold: 32.791	Min: 32.791	Max: 46.142
response tokens	Avg: 512.000	Cold: 512	Min: 512	Max: 512

For 5 trials of gpt-4o-2024-05-13 @ 2024-10-15 12:17AM:

Stat	Average	Cold	Minimum	Maximum
stream rate	Avg: 92.340	Cold: 98.1	Min: 68.1	Max: 102.3
latency (s)	Avg: 0.467	Cold: 0.5039	Min: 0.4	Max: 0.57
total response (s)	Avg: 6.128	Cold: 5.7147	Min: 5.4467	Max: 7.9079
total rate	Avg: 85.083	Cold: 89.594	Min: 64.745	Max: 94.002
response tokens	Avg: 512.000	Cold: 512	Min: 512	Max: 512

“cold” is the first call made to the model’s stats.

gpt-4o-2024-05-13 at its slowest is 40.3% faster than gpt-4o-2024-08-06 at its fastest.

One caveat is that the caching of 2024-08-06 also means that repeated calls may be pinned to one server instance, here returning all the same fingerprint, which means there may not be a large sampling of all possible speeds across different types and loads of servers in operation.

Topic		Replies	Views
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	12	9599	July 22, 2024
Really slow response time with text completions API today? API	2	169	May 27, 2025
Assistant API request is taking very long response time API assistants-api	2	339	December 23, 2024
Completions API Suddenly slow API gpt4o	4	662	October 15, 2024
Gpt-4o-mini is really slow API gpt-4o-mini	6	2742	March 18, 2025

GPT-4o-2024–08–06 slower then previous version

For 5 trials of gpt-4o-2024-08-06 @ 2024-10-15 12:17AM:

For 5 trials of gpt-4o-2024-05-13 @ 2024-10-15 12:17AM:

Related topics