GPT-3.5 API is 30x slower than ChatGPT equivalent prompt

trackscatsteelskylab · October 10, 2023, 7:32am

We are getting incredibly slow responses (~ 34 seconds) when generating 300 tokens with GPT 3.5 Turbo API via curl.

The same prompt through ChatGPT 3.5 on the same network and machine is about 1 second.

This a PLUS user account and we’ve also paid for API credits, if that matters.

The test prompt is 270 tokens and is just asking for the definition, synonyms, and entomology of a word.

_j · October 10, 2023, 7:57am

It seems some are indeed getting slower performance than others. Are you in Europe? Antarctica?

One thing you can test is to see how fast the model gpt-3.5-turbo-instruct works for you (needing the completion endpoint and a different prompting style than “messages”. When it first came out stealthfully, I was getting near 100 tokens per second. Streamed tokens still flow out of it smoothly.

– completion: time 3.426s, 184 tokens, 53.7 tokens/s –

trackscatsteelskylab · October 10, 2023, 8:07am

Wow that’s interesting, gpt-3.5-turbo-instruct completed the request in 3 seconds, which is acceptable.

I’m in Toronto, Canada averaging 2gbps up/down, so I don’t think it’s the connection speed.

Instruct is labeled as legacy, not sure how we can build a production offering with this.

_j · October 10, 2023, 8:16am

That model is the replacement instruction-following completion model for models like text-davinci-003. They just have the playground labeled weird because they also announced the endpoint was going away, but obviously not.

papaya_ai · October 10, 2023, 8:28am

I had the same problem. I am using lambda to interact with gpt-3.5-turbo. It created about 567 tokens in the 90s

gopi80211 · October 10, 2023, 10:12am

Will GPT 3.5 turbo instruct, large content generation also works?

_j · October 10, 2023, 10:19am

It has the same context length of 4096, and it is easier to produce large output because it doesn’t have excessive ChatGPT training. It behaves differently, and is still more like completion than instruct, so you’ll need to re-engineer your prompts.

humbroll · October 11, 2023, 4:18am

Got the same issue, I’m a PLUS user as well.
Using gpt-3.5-turbo model and it takes 50secs to create 490 tokens.

"usage": {
    "prompt_tokens": 145,
    "completion_tokens": 345,
    "total_tokens": 490
}

I’m in San Francisco, CA. It used to take less then 5 secs before. It’s been slowing down from this afternoon PST. I don’t see any red bars in the ChatGPT status.
Also the playground seems like way slower than before. Does anyone encounter the same issues?

kervin · October 11, 2023, 10:03am

It appears that my business has experienced the same issue as yours today, with the API for GPT-3.5 Turbo becoming unusually slow.

rodrigo.siqueira · October 12, 2023, 3:49pm

I am also getting about 10 tokens/second from GPT-3.5 API and this is very slow compared to few days ago. I am receiving complaints from customers who have to wait from 30 seconds to 1 minute to get the usual 300 to 600 tokens per response that my business require. It was much faster before.

anon5861895 · October 12, 2023, 6:54pm

You may not believe this. I’ve figured out what’s wrong with the speed in GPT 3.5-turbo. I think it is account-related.

I did some experiments using the playground by giving it a prompt like “Give me a full dictionary of ‘platform’.”

It was slow like hell with my production account. But after I switched to another account I’ve saved for emergency, the speed has become normal!

Not that I did something funny with my account. I’ve paid all its bills, and it didn’t get any warning emails.

If you’re in a hurry, I suggest you guys to experiment with new or spare accounts.

_j · October 12, 2023, 10:37pm

Interesting you would find the link to a particular account.

Describe this “paying all its bills” part, though.

In the API, you are either:

in monthly billing - and get billed the following month for your usage;
in prepay plan - where your API calls are simply denied if you don’t have sufficient credits.

(I don’t think OpenAI would warn you by email against giving them money, either )

anon5861895 · October 12, 2023, 11:07pm

Whether it’s a monthly-billed account or a prepaid account doesn’t seem to matter.

One of my good-old monthly-billed accounts has gotten slow, and one of my prepaid accounts is slow, too. And a free trial account of mine is slow, too.

I won’t be surprised if OpenAI is doing some funny experiments. I read several opinions expressed in this forum that OpenAI treats small-time API users as experiments. They may be experimenting with throttling output at a human-readable speed.

It feels like luck that one of my spare accounts is not yet affected.

I’ve record a video to prove my point here. https://www.youtube.com/watch?v=f2Y_3tgWMXI

Left one is working one, right one is slow one. Prompt is both : Give me full dictionary for word ‘platform’.

Paul.Millar33rd · October 14, 2023, 11:36am

We are too we are in uk… not sure why its so slow… did OpenAI tell us?

anon5861895 · October 14, 2023, 5:14pm

I have reported this post to OpenAI help, but I’ve got only a standard answer. (as expected)

Several of my accounts are slow as hell. Only one of my spare accounts has normal speed, so I’m using the good one as a last resort.

I could only guess that some of accounts are being assigned to crowded nodes. (maybe deliberately?)

Slow ones generate at a human-readable speed. If you’re streaming, it’ll be at least bearable for users. But if you’re not streaming, your services should be as good as dead. I mean, who’s going to wait 30~50 seconds with no output? Users’ll cancel and go away, but you are still billed for the tokens.

wboudabbous · October 14, 2023, 9:15pm

I’m having a slow response from gpt-3.5-turbo-16k, from 20 sec to 180 sec today.
And for other simple prompts, from 3 secs to 30 secs.

Is this related to anything I may correct?

anon5861895 · October 15, 2023, 12:36am

Did the symptom start today? If so, more and more people are getting affected by this problem.

Interestingly, GPT4 is not affected.

Also, I’ve created a new ccount from a brand new IP address for a tester. It was the same. It’s slow.

ameramayreh · October 15, 2023, 10:27am

I have the same issue, we noticed this from Friday Oct. 13. I use gpt-3.5-turbo-16k-0613, It was taking more than 3 minutes. Today:

gpt-3.5-turbo-0613

result:  {
  object: 'chat.completion',
  created: 1697364500,
  model: 'gpt-3.5-turbo-0613',
  choices: [ { index: 0, message: [Object], finish_reason: 'stop' } ],
  usage: { prompt_tokens: 2436, completion_tokens: 1120, total_tokens: 3556 }
}
time:  26.156s

gpt-3.5-turbo-16k-0613

result:  {
  object: 'chat.completion',
  created: 1697364546,
  model: 'gpt-3.5-turbo-16k-0613',
  choices: [ { index: 0, message: [Object], finish_reason: 'stop' } ],
  usage: { prompt_tokens: 2436, completion_tokens: 787, total_tokens: 3223 }
}
time:  92.874s

vfil.us · October 15, 2023, 11:57am

Same for me. It looks like OpenAI uses some algorithms to temporarily slow down some api accounts. It happened with my api account few times in the past. Then in few days everything was working as usual.

mhall72 · October 15, 2023, 8:14pm

Same issue, using gpt-3.5-turbo-16k I’ve gone from an average response of 46seconds to over 300 seconds now. Many are even timing out at 600 seconds. rerunning the same context and comparing the results shows this.

Also tried to create a new account and leverage a different key, but same issues. This is causing huge issues with my client base and is killing me.

Topic		Replies	Views
GPT-3.5 Turbo API response is slow API	20	12589	November 11, 2023
GPT-3.5 API is very slow. Any fix? API	31	9999	October 12, 2023
We proved the API is intentionally slow API	56	18575	May 2, 2023
Chat Completion API super slow and hanging API	8	2391	December 13, 2023
Error: 429 Too Many Requests API	56	14542	December 2, 2023

GPT-3.5 API is 30x slower than ChatGPT equivalent prompt

Related topics