Problem with the GPT4 chat usage rate limit

Problem with the GPT4 chat usage rate limit
I am developing an application and a few days ago they gave us access to GPT4, currently it was working with GPT-3.5-16k, When changing it to GPT4 I am starting to have failures of the limit rate of requests per minute, I have this Log of the application dodes you can see that you can still make 199 request and where I still have margin in the amount of tokens I’m sending, someone could help me with this?

Content-Type: application/json; charset=utf-8
Content-Length: 366
Connection: keep-alive
vary: Origin
x-ratelimit-limit-requests: 200
x-ratelimit-limit-tokens: 10000
x-ratelimit-remaining-requests: 199
x-ratelimit-remaining-tokens: 3276
x-ratelimit-reset-requests: 300ms
x-ratelimit-reset-tokens: 40.339s

    "error": {
        "message": "Rate limit reached for 10KTPM-200RPM in organization [org-ID] on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at if you continue to have issues.",
        "type": "tokens",
        "param": null,
        "code": "rate_limit_exceeded"
1 Like

On average, how many tokens are your requests? Looks like your one request was ~7000 tokens, if your next request was same size it’d be expected to fail because you only have 3200 remaining. Slow down your requests or add an automatic retry with exponential delay. See rate limit docs. Also note that there are different rate limits for free and paid API accounts, and paid accounts get increased rates after 48 hours.

1 Like