API (gpt 3.5 turbo) calls taking variable time (ranging from 2-70 sec) , on similar input length

arihantbadjatya · October 17, 2023, 12:39pm

I have been querying GPT 3.5 Turbo and have been facing the issue of huge variability in completion calls. It will be a great help if some solutions (or even potential causes) can be provided.

Diet · October 17, 2023, 12:53pm

Are you using the streaming API, or just normal responses?

if streaming, are you observing these ~70s to the first token?

if you just use regular responses, do note that the output length significantly impacts the response time. you can limit that by setting the max token length ( although you might end up getting cut-off responses)

do you see any correlation between response length and time it took to deliver?

fyi: I don’t think query length should have a significant impact on response time.

Topic		Replies	Views
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI API gpt-35-turbo , chatgpt , api	4	2941	December 24, 2023
Chatgpt-3.5 turbo model takes long time to respond. Is there any way to speed this up? API gpt-35-turbo , api-speed	7	6541	December 19, 2023
Unstable speed of gpt-3.5-turbo-16k API api , gpt-35-turbo-16k , performance	6	1085	January 9, 2024
Variable Response Times on API gpt-3.5-turbo API	2	1032	March 19, 2023
OpenAI API takes too long to response API api	2	832	March 25, 2024

API (gpt 3.5 turbo) calls taking variable time (ranging from 2-70 sec) , on similar input length

Related topics