Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI

We are currently using the GPT-3.5-turbo model via the Open AI API for our application. During our testing phase, we noticed a significant difference in response speed between the API and the ChatGPT UI. We would like to share our observations and seek insights, solutions, and potential workarounds from the community.
In our testing, we encountered a noticeable discrepancy in response times. For instance, when sending a prompt with approximately 2300 tokens, we observed response times ranging from 21 to 30 seconds when using the API. However, when using the ChatGPT UI, the response time averaged around 7 seconds. We performed various adjustments to parameters such as temperature, token length, and others, but the speed difference persisted across both platforms.
Request for Insights and Solutions:
We would greatly appreciate insights and potential solutions from the community to help address this speed discrepancy. Our goal is to achieve a more consistent and efficient response time when utilizing the Open AI API. Some specific areas where we seek assistance include:
Optimizing API Performance: We would like to explore techniques or settings to improve the response speed while using the GPT-3.5-turbo model via the API.
Understanding Infrastructure Differences: It would be helpful to gain a deeper understanding of any underlying infrastructure differences between the API and the ChatGPT UI that could account for the variation in response times.
Best Practices and Workarounds: We would like to learn about any best practices or workarounds that developers have successfully employed to mitigate or minimize the response time difference between the API and the ChatGPT UI.

Thank you in advance for your contributions and support in resolving this issue.


Do you have streaming responses enabled? Streaming gets the first token relatively faster

1 Like

You can also search the forums as we have a few threads on this currently.

Welcome to our dev community.

1 Like

And which is the answer? I see all of them waiting for some answer.
Streaming mitigates the user experience, but not the API performance for internal programatic requests