GPT 4 API is Very Slow Still

Please can I get some advice.
I am using the GPT 4 API for a new project, but after prompting I have to wait for over a minute for a response. The 3.5 Turbo was a lot quicker but obviously less accurate. Are there any fixes for this or is anything been done to rectify it? I noticed a couple of other forum posts about this issue.
Thank you.


I am averaging about 5 tokens of generated output per second with GPT-4. It varies from 2-7, but the eyeball average is about 5 TPS (output).

1 Like

GPT-4 is a lot slower than GPT-3.5, but also, we’ve seen significantly slower generation even with 3.5 in the last few days.
(We’re paid API users.)
Generating 1700 tokens from a prompt of 1300 took over a minute this morning (on gpt-3.5-turbo)

"model": "gpt-3.5-turbo-0301", "usage": {"prompt_tokens": 1329, "completion_tokens": 1692, "total_tokens": 3021}

So, maybe they’re trying to increase throughput by increasing latency, or this is just a new outcome of buckling under too much success …

(We also get “model is overloaded” errors with some frequency.)


There is a certain lag time in implementing a new model. I just started using GPT-4 regularly through the API a few days ago. Even though I’ve had it for a while.

So there could be a large wave of implementers starting to roll out products based on either GPT-4 or GPT-3.5, many weeks after the initial release. Which could partially explain the slowdowns.

1 Like

I have not had the chance to try out GPT-4 API response time after the new ChatGPT release since yesterday. I do not believe it has any impact on the GPT-4 API model. During our testing before, it took approximately 60-80 seconds to complete a full 8k prompt and response using the GPT-4 API.

Same here, paying API user and having very slow response times on both gpt-3 and gpt-4. Tested using Postman with stream on/off… OpenAI Playground has the same behaviour in terms of response time. But haven’t experienced any “model is overloaded” errors.

1 Like

Does anyone still experience error with GPT 4?

Since my last post above, I’m seeing a 2x speedup in GPT-4, from 5 tokens/sec to now 10 tokens/sec. Also, GPT-4-32k is still the fastest of the GPT-4 models at 20 tokens/sec.

That’s really interesting.

We just got access to GPT-4 8k today.

Great responses to our calls, but the speed is a bit of a bummer.

Wondering if the increase in speeds for models as usually just an incremental thing.

You say it’s doubled speed since May, so do you expect same again increase by November?

The speed of GPT-4 is one of the major complaints against it (you see this all over this forum). But the other one is quality. My main concern/fear is that OpenAI decides to increase the speed at the expense of reducing the model quality.

It’s a balance, and hopefully each and every speed increase does not come at the expense of quality. I’d rather have slow + quality than fast + poor quality.

The challenge here is to be patient, and hope hardware improvements arrive (like ASIC/FPGA level) and/or massive GPU’s arrive. And somehow through all the AI mania, get produced and make it into reality. Another angle is algorithm improvements, but those are tricky and take time to iron out.

It took me 2+ years of waiting to finally get a PS5. That’s what we are up against.

Having said that, I have no crystal ball saying what speedups might happen in November, especially for a product under massive demand and usage. I hope for you it’s shorter than waiting for a PS5.



slow + quality : GPT-4

fast + poor quality : GPT-3.5

(That seems to work great right now)

1 Like

true very true
Yes, the lag time in pathetic.

what about Azure GPT-4 API ? Is there a difference in speed in comparison to OpenAI API ?

Highly doubt it. I don’t see the reason why it would be much different.

It is so slow and out of focus. It keeps giving me wrong answers all the time!