Is there an issue with GPT 3.5 turbo 16k?

Is there an on going issue with GPT 3.5 Turbo 16k?! I am getting timed out, the responses take 5 mins even if it responds? What is going on? I am kinda getting annoyed and if this is the new norm, I’ll just setup local LLM pipeline to handle the workload.

Hi and welcome to the Developer Forum!

Does the issue continue if you try from the playground?

Looking at the playground its just really really slow… to fill out the response, and i also noticed that no matter what I do, it does not output more then 1700 token response. So i have to chunk everything.

I am just kinda surprised, that its bad like this. I know OpenAI looking into cost cutting and saving, but… its just gotten so bad, I’ve never seen it this slow before. Its so painfully slow, it takes 5 mins to output 1700 tokens… so that’s like 5.6 t/s?? Yikes!

I hope this is transition and not the new norm.

It could be that you are now using the model quite often and this is the first time this has happened so you may be on a higher latency model.

but it shouldn’t be this slow for a single prompt? to generate an output so slowly, but also that i can feed it a lot, but it only outputs 1700 max, I could never get it to output 2500. There is no option to force it to “generate” more. like on local LLM, I can force it to generate up to specific token if i need longer output, so for example

I say take this text and add html markup to it to convert it, i gave it 4000 tokens worth of text it only returns back 1700 tokens worth of output, and on API that shouldn’t be the case, and its also doing it like 5.6 tokens per second… so like… even if i chunk it… and do it in parts it would take 20 mins to get back 2500 words!

You might reach out to and use the support bot in the bottom right corner and let them know you are experiencing slow responses.