What is considered as normal latency?

apeli · March 20, 2023, 9:48am

Hi folks,

We’ve running into latency issues when trying out GPT 3.5 turbo and GPT through the API.

We’re asking the model to create questions based on a simple prompt and return the data as JSON. With text-davinci-003 we’re around 15-25 seconds for the full result to arrive, but with the better models its around 45 secs minimum, often way over 60 secs. Max tokens are around 2000 for the prompt and response combined.

Is this normal or way off? Either way they make the api pretty much unusable for production.

udm17 · March 20, 2023, 9:52am

In my experience, it is very varied in terms of response latency. Depending on the time of the day and the traffic, my query returns usually vary between 24 - 87 sec (I have a pipeline with multiple calls to GPT involved, usually with tenacity used to prevent RateLimitError).

While it is definitely not fit for high speed production as it is, maybe with foundry being available in the near future, that might solve the problem.

linus · March 20, 2023, 12:21pm

Hi @apeli,

at the moment this is (unfortunately) quite common. Around 1 Week ago it was more like 1-4 Seconds and quite stable in this range.

If you look in the forum you’ll see a lot of threads regarding high latency and timeouts. In my option this is related to the change in infrastructure and high demand at the moment.

Hope this helps you to better assess the situation

Topic		Replies	Views
Unstable speed of gpt-3.5-turbo-16k API api , gpt-35-turbo-16k , performance	6	1110	January 9, 2024
API call latency poses an issue API api	0	457	April 15, 2024
Problem with API request (long answer time) API	3	2258	December 14, 2023
High latency for fine-tuned gpt-4o-mini API	4	946	November 26, 2024
Is a high latency for a response with a prompt in other languages than English normal? API	2	659	October 26, 2023

What is considered as normal latency?

Related topics