GPT-3.5 API is 30x slower than ChatGPT equivalent prompt

We are getting incredibly slow responses (~ 34 seconds) when generating 300 tokens with GPT 3.5 Turbo API via curl.

The same prompt through ChatGPT 3.5 on the same network and machine is about 1 second.

This a PLUS user account and we’ve also paid for API credits, if that matters.

The test prompt is 270 tokens and is just asking for the definition, synonyms, and entomology of a word.

13 Likes

It seems some are indeed getting slower performance than others. Are you in Europe? Antarctica?

One thing you can test is to see how fast the model gpt-3.5-turbo-instruct works for you (needing the completion endpoint and a different prompting style than “messages”. When it first came out stealthfully, I was getting near 100 tokens per second. Streamed tokens still flow out of it smoothly.

– completion: time 3.426s, 184 tokens, 53.7 tokens/s –

1 Like

Wow that’s interesting, gpt-3.5-turbo-instruct completed the request in 3 seconds, which is acceptable.

I’m in Toronto, Canada averaging 2gbps up/down, so I don’t think it’s the connection speed.

Instruct is labeled as legacy, not sure how we can build a production offering with this.

1 Like

That model is the replacement instruction-following completion model for models like text-davinci-003. They just have the playground labeled weird because they also announced the endpoint was going away, but obviously not.

1 Like

I had the same problem. I am using lambda to interact with gpt-3.5-turbo. It created about 567 tokens in the 90s :sob:

Will GPT 3.5 turbo instruct, large content generation also works?

It has the same context length of 4096, and it is easier to produce large output because it doesn’t have excessive ChatGPT training. It behaves differently, and is still more like completion than instruct, so you’ll need to re-engineer your prompts.

1 Like

Got the same issue, I’m a PLUS user as well.
Using gpt-3.5-turbo model and it takes 50secs to create 490 tokens.

"usage": {
    "prompt_tokens": 145,
    "completion_tokens": 345,
    "total_tokens": 490
}

I’m in San Francisco, CA. It used to take less then 5 secs before. It’s been slowing down from this afternoon PST. I don’t see any red bars in the ChatGPT status.
Also the playground seems like way slower than before. Does anyone encounter the same issues?

1 Like

It appears that my business has experienced the same issue as yours today, with the API for GPT-3.5 Turbo becoming unusually slow.

I am also getting about 10 tokens/second from GPT-3.5 API and this is very slow compared to few days ago. I am receiving complaints from customers who have to wait from 30 seconds to 1 minute to get the usual 300 to 600 tokens per response that my business require. It was much faster before.

You may not believe this. I’ve figured out what’s wrong with the speed in GPT 3.5-turbo. I think it is account-related.

I did some experiments using the playground by giving it a prompt like “Give me a full dictionary of ‘platform’.”

It was slow like hell with my production account. But after I switched to another account I’ve saved for emergency, the speed has become normal!

Not that I did something funny with my account. I’ve paid all its bills, and it didn’t get any warning emails.

If you’re in a hurry, I suggest you guys to experiment with new or spare accounts.

Interesting you would find the link to a particular account.

Describe this “paying all its bills” part, though.

In the API, you are either:

  • in monthly billing - and get billed the following month for your usage;
  • in prepay plan - where your API calls are simply denied if you don’t have sufficient credits.

(I don’t think OpenAI would warn you by email against giving them money, either :wink:)

Whether it’s a monthly-billed account or a prepaid account doesn’t seem to matter.

One of my good-old monthly-billed accounts has gotten slow, and one of my prepaid accounts is slow, too. And a free trial account of mine is slow, too.

I won’t be surprised if OpenAI is doing some funny experiments. I read several opinions expressed in this forum that OpenAI treats small-time API users as experiments. They may be experimenting with throttling output at a human-readable speed.

It feels like luck that one of my spare accounts is not yet affected.

I’ve record a video to prove my point here. https://www.youtube.com/watch?v=f2Y_3tgWMXI

Left one is working one, right one is slow one. Prompt is both : Give me full dictionary for word ‘platform’.

We are too we are in uk… not sure why its so slow… did OpenAI tell us?

I have reported this post to OpenAI help, but I’ve got only a standard answer. (as expected)

Several of my accounts are slow as hell. Only one of my spare accounts has normal speed, so I’m using the good one as a last resort.

I could only guess that some of accounts are being assigned to crowded nodes. (maybe deliberately?)

Slow ones generate at a human-readable speed. If you’re streaming, it’ll be at least bearable for users. But if you’re not streaming, your services should be as good as dead. I mean, who’s going to wait 30~50 seconds with no output? Users’ll cancel and go away, but you are still billed for the tokens.

2 Likes

I’m having a slow response from gpt-3.5-turbo-16k, from 20 sec to 180 sec today.
And for other simple prompts, from 3 secs to 30 secs.

Is this related to anything I may correct?

Did the symptom start today? If so, more and more people are getting affected by this problem.

Interestingly, GPT4 is not affected.

Also, I’ve created a new ccount from a brand new IP address for a tester. It was the same. It’s slow.

I have the same issue, we noticed this from Friday Oct. 13. I use gpt-3.5-turbo-16k-0613, It was taking more than 3 minutes. Today:

gpt-3.5-turbo-0613

result:  {
  object: 'chat.completion',
  created: 1697364500,
  model: 'gpt-3.5-turbo-0613',
  choices: [ { index: 0, message: [Object], finish_reason: 'stop' } ],
  usage: { prompt_tokens: 2436, completion_tokens: 1120, total_tokens: 3556 }
}
time:  26.156s

gpt-3.5-turbo-16k-0613

result:  {
  object: 'chat.completion',
  created: 1697364546,
  model: 'gpt-3.5-turbo-16k-0613',
  choices: [ { index: 0, message: [Object], finish_reason: 'stop' } ],
  usage: { prompt_tokens: 2436, completion_tokens: 787, total_tokens: 3223 }
}
time:  92.874s
1 Like

Same for me. It looks like OpenAI uses some algorithms to temporarily slow down some api accounts. It happened with my api account few times in the past. Then in few days everything was working as usual.

Same issue, using gpt-3.5-turbo-16k I’ve gone from an average response of 46seconds to over 300 seconds now. Many are even timing out at 600 seconds. rerunning the same context and comparing the results shows this.

Also tried to create a new account and leverage a different key, but same issues. This is causing huge issues with my client base and is killing me.

3 Likes