GPT-4o-mini randomly much slower than GPT-3.5-turbo

Use GPT-4o-mini and GPT-3.5-turbo-0125 each to answer the same query. Sample 10 times.

GPT-3.5-turbo speed is consistent (about 5 seconds for 500 tokens).

GPT-4o speed is mostly slightly slower than GPT-3.5, but about 30% of the time, it somehow takes an insanely long time (19 seconds for 500 tokens).

Thanks for flagging this. Do you have an example request_id that we can take a look to debug?

Thanks for getting back to me. I tested again and the problem seems to have gone away. 4o mini is now about the same speed (maybe a bit faster) than gpt 3.5 turbo. Any idea whether that means it was a temporary issue that was permanently fixed, or whether I’m likely to encounter the issue again in the future during peak times? I recently switched to using Gemini Flash for most of my use cases which is significantly faster than both of these models by OpenAI, but if 4o mini can at least be consistent, then maybe it’s worth looking at even if it’s a little slower.