Getting rate limit error that specifies incorrect rate limit

I am in usage tier 5, and the limits page in settings gives my RPM limit on gpt-3.5-turbo as 10k. However, I am getting this error message fairly frequently:

{
“error”: {
“message”: “You’ve exceeded the 200 request/min rate limit, please slow down and try again.”,
“type”: “invalid_request_error”,
“param”: null,
“code”: “rate_limit_exceeded”
}
}

This error can occur regardless of threads endpoint I call, but in this example the endpoint was:
/v1/threads/runs

Is there something different with the threads API or is this a case of my rate limit not being respected correctly?

The Assistants API has an unmentioned rate limit for actual API calls, perhaps to keep it “beta” for now. What you report is an increase from the long-time limit of 60 requests per minute, which could be exhausted just polling for a response to be completed.

1 Like

Does this rate limit apply per API key or per assistant?

1 Like

It is organization-wide rate limit, and is about the calls to the endpoint, nor the contents.

Does this same rate limit exist for the chat completions API?

No. To chat completions I do not know of any practical limit where you start to get cut off except for the model-based limits and by encoded tokens.