Completions API Suddenly slow

_j · October 15, 2024, 1:33pm

Performance is down even further from six hours ago. Benchmarking again:

For 3 trials of gpt-4o-2024-08-06 @ 2024-10-15 06:21AM:

Stat	Average	Cold	Minimum	Maximum
stream rate	Avg: 29.233	Cold: 27.5	Min: 27.5	Max: 30.4
latency (s)	Avg: 0.679	Cold: 0.6909	Min: 0.4539	Max: 0.8909
total response (s)	Avg: 18.175	Cold: 19.2412	Min: 17.5793	Max: 19.2412
total rate	Avg: 28.217	Cold: 26.61	Min: 26.61	Max: 29.125
response tokens	Avg: 512.000	Cold: 512	Min: 512	Max: 512

For 3 trials of gpt-4o-2024-05-13 @ 2024-10-15 06:21AM:

Stat	Average	Cold	Minimum	Maximum
stream rate	Avg: 51.333	Cold: 57.5	Min: 46.7	Max: 57.5
latency (s)	Avg: 0.620	Cold: 0.512	Min: 0.512	Max: 0.703
total response (s)	Avg: 10.649	Cold: 9.3955	Min: 9.3955	Max: 11.5906
total rate	Avg: 48.461	Cold: 54.494	Min: 44.174	Max: 54.494
response tokens	Avg: 512.000	Cold: 512	Min: 512	Max: 512

42 → 28 on gpt-4o
85 → 48 on gpt-4o-2024-05-13
27 on gpt-4-turbo

If you are not using the specific features of structured output, you could switch to that versioned model currently performing better.

From past continuous analysis, 6am-9am seems to be the peak slowness time on weekdays, maybe moreso today with yesterday being a US holiday, and everyone getting back to work with their AI questions. You can really see the chunk progress pause and struggle, as though inference is time-slicing between users.

Hopefully the data ops people will be on this.

A week of performance:

Topic		Replies	Views
Extremely long request times- Completions API gpt-4o Bugs gpt-4o	10	537	December 5, 2024
Ongoing latency in GPT 4o this week API	5	266	July 16, 2025
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	12	9599	July 22, 2024
GPT-3.5 API is very slow. Any fix? API	31	9943	October 12, 2023
Runs randomly take > 30sec Bugs assistants-api	7	727	September 11, 2024

Completions API Suddenly slow

For 3 trials of gpt-4o-2024-08-06 @ 2024-10-15 06:21AM:

For 3 trials of gpt-4o-2024-05-13 @ 2024-10-15 06:21AM:

Related topics