Rate limit error Tier 2 Account Rate Limit Issues with gpt-3.5-turbo

Is that via direct call or by assistants?

Is that using a chat with functions or some chat history?

Are you using non-stream and logging your reported tokens, logging headers returned per request with rate info?

Any chance your timeouts are low and you are hanging up on a model that is completing for you? OpenAI libraries can retry in such a scenario.

I have a theory that due to the algorithm you kind of get the equivalent of a “constant refill” of rate limit with API. Shooting off 80000 tokens-worth in the same second depletes your allowance and future full reset time more than the same over a minute.

I just yesterday gave a little code for grabbing the “try again in” error and actually waiting that long, but you’d have to apply it to shutting off your queue (untested because I’d have to send 17k tokens per second). Or you can monitor the header rates and limit early.