Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI

anpct · May 17, 2023, 1:11pm

Hello!
We are currently using the GPT-3.5-turbo model via the Open AI API for our application. During our testing phase, we noticed a significant difference in response speed between the API and the ChatGPT UI. We would like to share our observations and seek insights, solutions, and potential workarounds from the community.
Observations:
In our testing, we encountered a noticeable discrepancy in response times. For instance, when sending a prompt with approximately 2300 tokens, we observed response times ranging from 21 to 30 seconds when using the API. However, when using the ChatGPT UI, the response time averaged around 7 seconds. We performed various adjustments to parameters such as temperature, token length, and others, but the speed difference persisted across both platforms.
Request for Insights and Solutions:
We would greatly appreciate insights and potential solutions from the community to help address this speed discrepancy. Our goal is to achieve a more consistent and efficient response time when utilizing the Open AI API. Some specific areas where we seek assistance include:
Optimizing API Performance: We would like to explore techniques or settings to improve the response speed while using the GPT-3.5-turbo model via the API.
Understanding Infrastructure Differences: It would be helpful to gain a deeper understanding of any underlying infrastructure differences between the API and the ChatGPT UI that could account for the variation in response times.
Best Practices and Workarounds: We would like to learn about any best practices or workarounds that developers have successfully employed to mitigate or minimize the response time difference between the API and the ChatGPT UI.

Thank you in advance for your contributions and support in resolving this issue.

firtina · May 17, 2023, 2:43pm

Do you have streaming responses enabled? Streaming gets the first token relatively faster

PaulBellow · May 17, 2023, 5:42pm

You can also search the forums as we have a few threads on this currently.

Welcome to our dev community.

angel.sancho.ferrer · May 24, 2023, 10:00am

And which is the answer? I see all of them waiting for some answer.
Streaming mitigates the user experience, but not the API performance for internal programatic requests

Topic		Replies	Views
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	22828	November 9, 2023
Chat GPT's API is significantly slower than the website with GPT Plus API	35	36912	December 12, 2023
ChatGPT API Very Slow at generating Responses API gpt-4 , api	8	5492	December 25, 2023
Chatgpt-3.5 turbo model takes long time to respond. Is there any way to speed this up? API gpt-35-turbo , api-speed	7	6586	December 19, 2023
Performance issue with gpt-4-turbo-preview API API gpt-4 , api , performance	1	1254	February 17, 2024

Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI

Related topics