GPT-3.5 API is very slow. Any fix?

Getting incredibly slow responses (~ 34 seconds) when generating 300 tokens with GPT 3.5 Turbo API.

The same prompt through ChatGPT 3.5 is about 1 second.

This a PLUS user account and I’ve also paid for API credits, if that matters.

There have been no significant deviations in response generation time in the past 7 days, any effects you are experiencing must be local to you, either a local instance issue or edge server problems.

Thanks for sharing that, where can I access that? I’m having the same issue, all of a sudden my 3.5 turbo response times are taking 3x longer than what they did just 72 hours ago.

You can find this super useful site over at https://openai-status.llm-utils.org created by of our very own forum members.

I imagine that any slowdown, if it is indeed caused by a server issue, will be addressed quickly, it can take time to both detect and resolve these issues, making use of help.openai.com to report issues can also ensure that tracking and issue monitoring gets notified.

2 Likes

I am also seeing slow response time for gpt-3.5-turbo API calls. The graph is also showing there was latency peak

Latency peeks for a short period are common for almost all remote API services world wide, these can be local issues, text environment issues, actual service issues and a whole host of connectivity problems.

Applications that make use of remote API’s should always make the assumption that the endpoint is unresponsive and have suitable error handling and methods such as retries with exponential backoff and ensuring that any blocking calls are done in their own threads to allow monitoring, and if required update the user with progress.

Seeing the same issue, GPT-4 response for the same query is about 40 seconds whereas GPT-3.5-turbo has been consistently around 2.5 minutes. Been testing every day since Friday.

Python test code to run (and slowness measure confirmed by another)

I’m still doing well, compare the “latency” of 1 token to a full response of 512:

Title
[1 tokens in 1.0s. 1.0 tps]
Title: Embracing Digital Transformation: Unlocking the Power of the Digital Age

[128 tokens in 1.9s. 67.6 tps]
Title: Embracing Digital Transformation: Unlocking the Power of the Digital Age

[512 tokens in 7.2s. 70.8 tps]

post-pay, Western US.

Unlike other reports of massive slowing in the last few days:

So this is not a “blame on intermittent stuff and the user”.

Although it does appear to be “sticky” to particular users. Reports of where you are geographically connected, whether you are prepay or billing or free trial, whether you ever paid a bill, etc. could help determine why some are affected and some are fast.

Yup, there has 100% been an uptick in the number of people with the same complaint of slow performance, but given the number of members, that number seems fairly small, so it’s either a geographic issue with a particular node or something else that is, as you mention, account based… maybe?

As I say on every one of these kinds of problem reports, please send them to help.openai.com as that will at the very least get the issue in front of an AI looking for commonalities, similar things happen here, but doubling up of visibility will not be a bad thing for awareness.

We’ve just reported a problem. We were having about 3x/4x slowdown with gpt-3.5-turbo over the last few days. We tried other 3.5 models, and they are all the same (London based)

1 Like

its rate limit maybe
works fine for me with maximum of rate limit (Thanks openai)

I’m fast, and have had no need to request higher rate limits, so your idea doesn’t seem to correspond with which users are experiencing performance concerns, in my case at least. There’s “my customers are complaining” tiers of forum API users that are affected.

It could still be other unexplored facets of particular accounts that bring on the slow output production.