GPT-4 performance is not acceptable for production use cases

After receiving access to the GPT-4 models we’ve rolled back to using the GPT-35 Turbo models in all of our use cases.

The same API prompt between these two models using the chat completions api can be seen below. That is the 95th percentile latency…its even slower in some of our other prompts.

Will we see performance improvements in future releases?


GPT -4 is notoriously slow currently as they are still trying to scale their architecture to handle all the users. While it might be slower, in terms of the performance, is GPT-4 doing better ? @jredl

1 Like

I’ve had a similar experience of GPT-4 being slower than 3.5-turbo. But it is likely worth keeping in mind that the former is still in Limited-Beta and we’ll likely see speed-ups over time

Yes, the quality of the response was much better when using GPT-4. However, the performance of the response times from the API isn’t acceptable from a user perspective.

We’re eagerly awaiting “non limited” rollout.


While what you say is true, I believe the performance of the API is related to performance bottle necks within the serving infrastructure of OpenAI and nothing related to the model itself.