API Calls extremly slow or never finished

For a good week now, I have had the problem that responses to simple requests via the Rest API take several minutes or no response at all is delivered, whereas the playground delivers results in seconds. Weeks ago, there were hardly any such problems.
What could be the reason for this?


Welcome back…

We’d need to know more information to take a guess, but it could be any number of things. If you search the forum, you’ll see a lot of similar posts that point to one thing or another.

Hope your development has been going well.

Yup. Too many 500 errors now even though it doesn’t show up in the status

Hi Paul, since I don’t get a response from openAI, it’s hard to tell what’s going wrong. So I am still generating quizzes depending on the text input. Currently the problem seems to be that if I send a long text to openAI and tell it to generate 5 questions, everything is slow, but I get a response. If I use exactly the same prompt with a higher number of questions (i.e.20), I no longer get a response.

How many tokens are you sending? Are you setting max_tokens? What model?

1 Like

GTP-4 and maxToken 7.000

it should be absolutely enough and I would expect to get fewer answers than requested if it is not enough or an error - but then there is no response…

When generating 10 Questions it is still working (taking minutes) at about 2000 token costs.
But 20 Questions do not work anymore (2 weeks ago it was no problem).

So these are my observations -
Post the 19th Oct fix, I have seen the services go up and down like crazy for the API calls. I am working in the IST time-zone. Today afternoon from like 12pm -4pm GPT4 responses were super slow or getting timed out.
Post 4pm IST things are better intermittent time out. Mornings are good.
Late evenings are also good. Not sure what happens in between.

Response time before 19th Oct was 5 seconds now LLM ranges between 17-21 seconds. I am using the Azure OpenAI service. Post 9PM IST , I do get 7 second responses. I tried GPT4-32K but that also goes down during the afternoons. But it has a 10 second better response compare to GPT4.

I am using it for a custom Q&A solution.

Anyone can help guide me.


You might try sending 10 questions at a time and appending them? It’s a tradeoff between context and performance. The bigger your prompt and response, the longer it’s going to take to process.

Hi Paul, unfortunately that doesn’t work… if I send a text and generate questions about it several times, the questions are often very similar. I prevent this in the prompt by giving the instruction that the questions should not be too similar. Of course, I could send the existing questions and exclude similar ones - but that would massively increase the length of the prompt again.

And I found out that at certain times even the prompt to generate 5 questions doesn’t work anymore. At the same time, ChatGPT and Playground are slower than normal, but they do the job in <10 seconds… It seems that api calls have a very low priority ;-|.

1 Like

My next suggestion would be to try to improve the prompt(s) so that you can get them working with GPT-3.5-turbo or GPT-3.5-turbo-instruct… with the much cheaper cost, you can add a bit more context. Should be quicker too.

The API will eventually speed up, but you have to take into consideration the speed at which OpenAI is growing as a company.

I’ve got the same problem. API calls last too long and are killed by Idle timeout all the time(((
It’s so underwhelming

1 Like

check this site for the unofficial openAI status -

I can breathe now after checking this site. Else I was suspecting every bit of my code.
Hope that helps :slight_smile:


Thanks a lot for sharing this resource - it is really precious!

1 Like

The api response time and the quality of the responses are very different in 3.5 and 4

I am making a personal loan app chatbot powerd by open ai api using 3.5 turbo. How can i do real time sentiment analysis on user chatting and i am passing user chat histroy and so that user can ask follow-up questions, now i want to know weather user wants to take personal loan or not by using their chat history , now the main problem is due to follow-up question user will not write whole query and i assuming therre will be 6-7 pair of question anwer will be thier , now with this much of chat data how can i know that user is wanted to take personal loan and than can chatbot proceed further.

I do not have any user data except this chatting data with my chatbot

What appraoch could i follow to know the intent of user.