OpenAI Why Are The API Calls So Slow? When will it be fixed?

This is a question for OpenAI, why are the API calls not the same speed as the Playground? 2 weeks ago it was bad and I tought OK, you guys will fix it but no, it’s much worse today!.. Responses are taking over minute, do you realize how ridiculous that is. OpenAI please enlighten us with your plan, what’s going on with the slow API calls?


Seeing the same here. Does being on a free grant make a difference?

1 Like

I don’t think there is a difference in response time, but the free tier has tighter rate limit’s.

There’s more info on the rate limits here:

I’ve also noticed the API being much slower over the last few days. Nothing in my code has changed.

I’m using chat completions, simple prompts and gpt-3.5-turbo.

What used to be 5 second responses are now anything between 20 and 40 seconds. Some up to a minute.

I’m not being rate limited in anyway.

I am using the free api account, but that’s only because my initial free credit hasn’t been used yet. No idea if I’m being de-prioritised due to this. The playground and ‘consumer’ web interfaces are nice and speedy vs the api at the moment at least.


Just to add to this and to try and speed things up from a user’s perspective, I’m going to have a go and using stream:true in the request, so I can start to receive a response quicker. I’m using node.js as the back-end, so I’ve a little reading to do first on how to implement it.

1 Like

chekc this ChatGPT-Desktop/openAi.ts at 298cb4f9b95fb0df8f1cbed7ea6268f8306db036 · Synaptrix/ChatGPT-Desktop · GitHub
It’s simple, like normal request. But be aware of different response structure.

Unfortunately, streaming doesn’t really “speed” things up.
But at least you have something to watch until it’s done. :wink:

1 Like

Has anyone tried to roll back gpt-3 API and how the throughput looks like?

@ajondo You’re right of course, but it does give the impression of a faster response, particularly if the reply is a bit long.

Note for others: I implemented streaming yesterday, and I have to say that it’s a much better user experience.

Back onto the main topic of ‘slow API’. The last few days have been really quite bad. Even with streaming, a response could take a long time to ‘start’. But last night as I was testing my new streaming interface I noticed some odd, but promising, behaviour. Randomly I would get very quick responses. They were rare at first, but definitely increased with frequency. This morning, all responses have been quick so far.

So, the whole thing looks like a capacity issue to me. Not great if you are building a commercial app. I think OpenAi need to carve out more capacity for paying API users vs free ‘have a play’ accounts, but who knows what’s happening behind the scenes…


@ajondo is there a simple way to implement streaming? Using 3.5-turbo, have everything setup already but load times are killing me.

Anything changed from your experience? This is ridiculously slow. Depending on how I query the API I either have to wait almost 20 sec or I get “That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at if the error persists.” error (Model was turbo.). I’m using unofficial .Net library, so that too might create some issues.

I am getting 47 second response times. Are API responses actually just really human generated :slight_smile: Its not actually a usable api for anything other than gathering training data or something.

Now I invoke api cost about 10s with simple questions like 'How many ethnic groups there are in the United States ’

I’ve noticed that at certain hours of the day it’s much faster :smiley: could also be something else… but yeah iterating on prompt is truly a pain as it’s slow ah

1 Like

I’m encountering the same issue with the DALLE2 API. When using the official tool, generating an image only takes 10 - 20 seconds. However, when using the API, it falls within the range of 70 - 90 seconds.

Maybe a bit late, but I get the same speed in playground and in API. I think the big difference lies in the loading of the response. Note that playground loads word by word, the API loads full answers after some time. For my tests, the full/overall response time was the same.

I was getting higher than usual latency yesterday for a few hours … today it’s back to normal… my response time on API and the website are the same depending on model used … yesterday I was getting over a second and a half latecy on both web and API… today I’m pulling right around ⅒ of a second

Damn that is fast. How did you do it. My API in JS loads for 20 seconds or so.

Aichat is my primary useage … $ cargo install aichat --force … aichat is .rs but if you go to GitHub I have it running even on an unrooted android phone (and have some tasker tasks like reply to certain emails and texts … GitHub - 0m364/aichat: Using ChatGPT/GPT-3.5/GPT-4 in the terminal . .. set this up to run on unrooted android

Thanks, a lot, I will look into that. :slight_smile: