The error message of "That model is currently overloaded with other requests. " using gpt-3.5-turbo

I am a pay-as-you-go user and use ChatGPT API to process a csv file containing thousands of rows of short paragraphs using a python program. The program will always process around 20-30 rows succesfully then stop with an error message:

‘message’: 'That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID XXXX in your message.

I have tried many different times and everytime this happens at some point.

I checked status.openai.com, and its says all models fully operational. What can I do at this point?

the rate limits are pretty generous - 3,500 requests per minute / 350k tokens per minute on gpt3 models as long as you’ve had the account for more than 48 hours. but the models get that from time to time anyway and it doesn’t tend to make it to the status page!

if however you are on gpt4 api, the limits drop to circa 200 requests per minute - which model are you using?

try adding a throttle to the request maybe?

I have the same issue. I’m using the GPT3.5 model with the python API. I’ve already written a couple of times to the Help Center, but nothing. In the last few days it has been impossible using the model: 40/50 seconds in the best case to obtain a response and 9 out of 10 times the horrible message 'The model is currently overloaded…".
It is impossible to use the API and I can’t do anything. Please, provide a “CONCRETE” response and please, resolve this situation once and for all.
We are customers, and we are paying (pay-as-you-go).
Thank you in advance.

I use gpt-3.5-turbo. For now, I find a temporary solution by adding a retry logic in my code. This way, at least the program itself won’t be interrupted and will continue to process the rest of the data.

result = response.json()
if ‘choices’ in result:
address = result[‘choices’][0][‘message’][‘content’]
return address
else:
retries += 1
print(f"Request failed. Retrying ({retries}/{max_retries})…")
time.sleep(2 ** retries) # Exponential backoff delay

My guess is that, unless you have usage in the $5k/month and up level, the “paying customer” argument isn’t very strong.

Here’s the thing, though – I’m doing an enterprise use case, not a consumer use case. I’d be happy to pay 10x what I pay now, and a monthly base charge, if I could get enterprise level support and some performance guarantees … but I think OpenAI is in startup mode, where moving quickly on product is more important than building the most possible revenue.

I think you are spot on with your assessment here - we do have to all remember that they are still doing all of this in Beta and whilst we are paying for it, they have no SLA’s and the service seems to be provided as-is.

I enjoy using OpenAI and the Azure implementation but for enterprises cases, I’m leaning more towards the other options or even self hosting - depends on the budget but I’d be afraid to launch anything commercial on the back of a beta platform with no support as the end users don’t like it when we blame a service that has no backup.

Out if interest, have you considered using any of the other models? Depends a lot on the use cases but it feels more and more that OpenAI are not currently in the mode of giving support or moving to a supported V1 product.

1 Like

While the status code is 429. This is not a rate limiting error . This indicates that the model itself is overburdened and has nothing to do with you or your account. It’s clearly documented in OpenAIs error handling documentation

There is nothing to do about it other than retry

Facebook’s isn’t commercially licensed, Google’s isn’t actually available outside a small set of selected alpha developers, and the rest aren’t as good as GPT-3.5. (And I need at least that level of capability.)

On a parallel track, I’m trying a fine-tuning approach on one of the better open source ones, we’ll see how that goes. At least it’s smaller so it infers faster…

1 Like

I’m doing the same - I will update here with how we get on… I think it is sadly becoming the only option

I’m using gpt-3.5-turbo-0613 and finding much better response, I expect a lot of folks use gpt-3.5-turbo. Before that I used the March snapshot model. I’m generating chat completions from provided contexts, so a snapshot is perfectly fine for my purpose.