Unstable speed of gpt-3.5-turbo-16k

Dears,

We are getting so slow responses with gpt-3.5-turbo-16k from time to time.
It was taking 18 seconds, then it became more than 3 minutes before a month. Then it became 20 seconds, last week it became 1.8-2.8 min, then reduced to 1 min, then 30 seconds, yesterday it was 45 seconds.

OpenAI support is not useful, their context window is less than 2K tokens I think :frowning:

Is there any explanation about this inconsistency?
Is there a way to get consistent fast speed?
If not, is there alternative with large context window that is production ready?

We are in tier 3 as we are still in testing and demoing to our customers and prospects.
The use cases is structuring data as JSON (that is reflecting on a UI), with the ability to modify the result by conversation.

Thanks

Some variation is normal I’m afraid. You could consider accessing the API via Azure’s OpenAI platform. I have only been using it so far for GPT-4-turbo for select use cases but found the API performance to be generally fairly stable.

3 minutes response time?! How big is your prompt, expected completion and what’s the use-case?

I have been getting response times exceeding 5 mins for outputting 6.5k tokens with 3.5-turbo-16k context. This model is a godsend but the latency on it can be insane sometimes.

Is it just today? I have some issue with the speed today as well (for 1106 models).

It’s not just today, I have sporadically experienced extremely bad latency with 3.5-16k. My use case is code generation. In my testing I haven’t been able to find an alternative to this model. Even gpt-4-1106 only has max 4096 tokens as output.

Is the bad speed from your computer or from a server?

What I noticed is that for me the API’s speed from my computer (I live in Indonesia and internet here though showing good metrics on Speedtest, may sometime be very poor for some services) are much worse that from the hosting.