Is the API freezing up? Having wierd intermittent behaviors

I am running a bot that is using chatGPT. (api with the gpt-3.5-turbo-1106 model)
The bot is able to make several requests in a row, and get responses in a timely manner. However, it will then hang on a request for 10 to 15 minutes. (or until I get tired of wondering what the hell is going on)
What makes me almost certain it’s some problem with the api is that the bot will continue working once a response is received, go smoothly for a few more requests with the api, and then boom. Happens again.
Never seen this behavior before the whole Sam thing in the news. Pretty sure there’s some internal shit going on.

Hi,

What Tier is your API account? It sounds like you may be hitting a rate limit issue. DO you have any error protection code with your API calls?

  1. I didn’t know we had “tiers” on the api usage. The only diff i’ve ever seen was “if you’ve made a successful payment to our api, then you have access to gptVision”. But I’ve made several successful payments, and I’ve been using this api under this same account for about a year now.
  2. I don’t know what “error protection code with your API calls” is, but if you mean the api call being invalid, and returning a 400-500 response, I would most likely catch that; as I am pretty good at javascript at this point.

I’m having the same problem. I’ve tried gpt-4, gpt-4-turbo, and gpt-3.5-turbo-1106 and the responses are generally very slow, in the order of minutes. I’m tier 2 and have error handling…no rate limit errors or anything like that, it just takes forever. This started sometime yesterday for me. My requests have about a page or two of text worth of tokens for the model to parse and respond to, most of it json.

1 Like

Yes your response is spot-on to everything I’m seeing. Hope it gets better in a few days, but this is (one reason) why we should not be dependent on OpenAI, but use models from Anthropic and Hugging Face as well. I should have already had a backup in line.

Hanging forever usually indicates some sort of connection pooling issue. It’s easy to think of it like this. You have a phone-line that can manage up to 5 calls. Instead of hanging up you are just holding the line. You try to make a 6th call so it sits and waits for a slot to open up but it never happens.

You need to make sure you are properly closing/re-using connections when making multiple calls.

Possible but I’m using js and Node, and have been for a long time when making these types of calls and apps, and never have had an issue like this. I’m 90% sure this is the problem of OpenAI, esp since I’ve gotten one person confirmed to have the same problem in the same time frame.

This reminds me of when someone loses something and goes “well, I’ve looked everywhere”. No, you’ve only looked in the areas you are aware of.

If it was an OpenAI issue you would not only have a single person (who is asking for a “page or two” of JSON). You would have hundreds or thousands of people spamming the forum.

I’m assuming that you’re using the OpenAI nodejs library. What is the version?

No currently popular raised issues regarding connection issues.

Even if it was OpenAI’s end, it makes sense to exhaust all your possibilities in the meantime.

I pointed out that I’m using node bc you referenced a Java article, and the principle may not translate. I pointed out that I have made many many apps like this before to point out the unlikelihood that its bc I have a few memory leaks after less than 100 calls.

To the point about how many confirmations I got, very little of the population uses this stuff, and even less people will take time out of their day to report an issue; here or github. Add to that that the issue is intermittent, a few days old, and likely not a widespread issue; then yes, I will take one person describing in detail exactly what I am experiencing, including the timeframe as a confirmation that something is up.

I am using "openai": "^4.19.1"

One of my bad qualities as a developer, but I don’t like to dig aimlessly. Esp if I’m not sure that what I’m looking for exists. So the next option is to just try another LLM

The article was supposed to demonstrate what connection leakage is. Which happens across all languages. The principle 100% translates.

You’ll notice in the first paragraph it states:

Here I will discuss a number of techniques that could apply to many different languages and frameworks.

“Digging aimlessly” implies you don’t know what you’re doing, which you have now stated multiple times that you do.

Maybe you’re the first to notice. It could be very beneficial to put in the labor and dig for everyone else.

I don’t even know why you mentioned another LLM when this is clearly a client issue. Why not just write your own simple http client wrapper?

You can spend the same amount of time waiting for a solution to try the same calls using curl. Test, test, and test some more. This is not “digging aimlessly”

Well, before looking at code, realize that:

  • It is not a rate limit by any chance. You should know them - they are CLEAR IN THE DOCUMENHTATION, RTFM applies, but if you hit a rate limit you do not get ahang - you get an error with a description. The serve rrejects.
  • If you are not a total beginner, you should know to check service status pages. And know how to use google to get it.

OpenAI Status

is the status for openai. You can see various issues over the last days - serious issues where it seems a lot of requests just got hung up. Depending on your timeframe - not your fault, learn with working with a provider that at the moment seems to have some internal load issues.

Hi, I just came across your answer. I’ve been debugging for two days now why API calls I make just freeze infinitely. Your mentioning of a connection pooling issue is the first new hint in many hours…

To quickly describe: I use Google Cloud Functions in 2 environments (staging, production), which include calls to the chat completion API, using the npm openai package on 4.17.0. My app doesn’t do a high volume of requests, can be up to 10-20 requests per minute on peak, but then nothing for half an hour. Can be concurrent requests, but only up to 5 or so.

I had used the staging project more extensively two days ago and at some point the calls just started hanging, never returning any response or error, just hanging.

The exact same prompt from the production environment worked.

Another, shorter prompt from the staging environment worked too. And this is what is baffling me. So only the combination of longer prompt (approx. 2000 tokens) and staging project makes the call freeze. So something related to processing time? But then only for that specific project?

Since you mentioned this connection pooling issue – it sounds applicable but why would then a shorter prompt work also in the staging project.

You need to make sure you are properly closing/re-using connections when making multiple calls.

Does one have any control over that? Isn’t the openai package managing such things as the connections?

Oke EDIT
two days later I found that I was DOSing my own logger based on an undeterministic case in a ChatGPT response and that is why it hung. Not the call itself but the log print afterwards.
So the only takeaway in context of this post is perhaps, if the call hangs it’s probably not the call itself but user (as in dev) caused errors around it.

And to my question above - that one still stands, the connection handling should probably not have to be handled by the dev, right?