It seems some are indeed getting slower performance than others. Are you in Europe? Antarctica?
One thing you can test is to see how fast the model gpt-3.5-turbo-instruct works for you (needing the completion endpoint and a different prompting style than “messages”. When it first came out stealthfully, I was getting near 100 tokens per second. Streamed tokens still flow out of it smoothly.
– completion: time 3.426s, 184 tokens, 53.7 tokens/s –
That model is the replacement instruction-following completion model for models like text-davinci-003. They just have the playground labeled weird because they also announced the endpoint was going away, but obviously not.
It has the same context length of 4096, and it is easier to produce large output because it doesn’t have excessive ChatGPT training. It behaves differently, and is still more like completion than instruct, so you’ll need to re-engineer your prompts.
I’m in San Francisco, CA. It used to take less then 5 secs before. It’s been slowing down from this afternoon PST. I don’t see any red bars in the ChatGPT status.
Also the playground seems like way slower than before. Does anyone encounter the same issues?
I am also getting about 10 tokens/second from GPT-3.5 API and this is very slow compared to few days ago. I am receiving complaints from customers who have to wait from 30 seconds to 1 minute to get the usual 300 to 600 tokens per response that my business require. It was much faster before.
Whether it’s a monthly-billed account or a prepaid account doesn’t seem to matter.
One of my good-old monthly-billed accounts has gotten slow, and one of my prepaid accounts is slow, too. And a free trial account of mine is slow, too.
I won’t be surprised if OpenAI is doing some funny experiments. I read several opinions expressed in this forum that OpenAI treats small-time API users as experiments. They may be experimenting with throttling output at a human-readable speed.
It feels like luck that one of my spare accounts is not yet affected.
I have reported this post to OpenAI help, but I’ve got only a standard answer. (as expected)
Several of my accounts are slow as hell. Only one of my spare accounts has normal speed, so I’m using the good one as a last resort.
I could only guess that some of accounts are being assigned to crowded nodes. (maybe deliberately?)
Slow ones generate at a human-readable speed. If you’re streaming, it’ll be at least bearable for users. But if you’re not streaming, your services should be as good as dead. I mean, who’s going to wait 30~50 seconds with no output? Users’ll cancel and go away, but you are still billed for the tokens.
Same for me. It looks like OpenAI uses some algorithms to temporarily slow down some api accounts. It happened with my api account few times in the past. Then in few days everything was working as usual.
Same issue, using gpt-3.5-turbo-16k I’ve gone from an average response of 46seconds to over 300 seconds now. Many are even timing out at 600 seconds. rerunning the same context and comparing the results shows this.
Also tried to create a new account and leverage a different key, but same issues. This is causing huge issues with my client base and is killing me.