Rate Limit even though Limit not breached

Getting following error when invoking chatgpt api:

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..

Attached screenshot for my actual bill usage:

and quota usage:

MODEL TOKEN LIMITS REQUEST AND OTHER LIMITS
gpt-3.5-turbo 80,000 TPM 5,000 RPM

My prod is down since morning due to this. Not sure what else to check/do?

Have you tried with a lower model? Say GPT3.5? Are you getting the same error?

We are getting rate limit on gpt-3.5 turbo only

What has worked for me sometimes is to use a backup account. Sometimes it’s just your account and the best thing to do is create a new one.

For my SaaS that’s in Prod, I have two accounts for my API and some logic that switches it over. You could even randomize it.

The feedback above is akin to ‘switch off and switch on’, but sometimes it just works.

The funny thing is this is my backup for Azure openai and that is where it is not supporting.

Rate limits can be quantized, meaning they are enforced over shorter periods of time (e.g. 60,000 requests/minute may be enforced as 1,000 requests/second). Sending short bursts of requests or contexts (prompts+max_tokens) that are too long can lead to rate limit errors, even when you are technically below the rate limit per minute.