I am using the gpt-3.5-turbo model for development and calling the API endpoint “https://api.openai.com/v1/chat/completions”. However, I’m experiencing highly unstable response times, ranging from 4 to 60 seconds. I’m wondering what could be the cause of this issue. Does it have anything to do with the token count, and are there any ways to improve the response speed?
Lots of interest in the models after the 16k announcement also AI of all kinds is getting used a lot more with big chunks of reddit going dark, so load on the entire system is very high at the moment.
but when I use GPT on poe.com, the response time is much faster than when I invoke the API. why is that? aren’t they using the API as well?
My guess is that they are using an Azure instance, possibly a dedicated one as their userbase is large.
There really is a serious problem to be addressed. And this error has intensified today. I have records of requests that took up to 15 minutes to receive an error response (bad gateway). Requests that were answered, which previously took a maximum of 4 to 6 seconds, are now taking 30 to 40 seconds. Also, wouldn’t it be a case of splitting the traffic between normal consumers and those that are causing slowdowns? We consumers and developers invest resources in developing applications that use this API. I believe that we deserve at least the respect of a decent response and a forecast of normalization of services.
Not slow at all for me, but the model got more stupid though (in my POV).