RPM rate limits at 60 when using gpt-4 with Assistant API

josh.sacks · February 28, 2024, 5:28pm

We keep hitting a rate limit when using the Assistant API with gpt-4. The error talks about a limit of 60 RPM, but we are tier 4, with an RPM of 10K!

We’re using the Python API bindings. Any ideas on why we are being so severely rate limted?

dignity_for_all · February 28, 2024, 6:01pm

Regarding that limitation, I think it was placed only for the use of Assistant which consumes a huge amount of tokens at once…

integritystl · February 28, 2024, 6:56pm

We are facing this issue as well using the GPT-3.5-turbo model with Assistant API and the node.js bindings. I’ve tried using the OpenAI help to ask about it but keep getting directed to the Request Increase Rate Limits form that is now closed.

My organization for our Assistant is Usage Tier 1, so there should be enough. We need to know if we should switch away from using Assistant. Currently we are making a chatbot and as things are we couldn’t have more than 30 active users ask 2 questions within a minute before we hit this limit.

_j · February 28, 2024, 7:02pm

Food for thought: “gpt-4” by name is gpt-4-0613, which is only trained on functions, not parallel tool calls. Retrieval in assistants needs one of the gpt-4-turbo-preview models to operate correctly.

You can get a deep iterative loop of things to be done, or errors, without return going within Assistants. You can check the run steps to see what’s actually whirring inside and setting tokens on fire.

Here’s a quickie snippet for getting the rate limits applied to your model call and account, from response headers:

import openai
api = openai.Client(timeout=5)
params = {"model": "gpt-4-0613", "max_tokens": 10, "top_p":0.9,
  "messages": [{"role": "user", "content": " I like robots like you"}]}
try:
    c = api.chat.completions.with_raw_response.create(**params)
except Exception as e:
    print(f"Error: {e}")
for key, value in sorted((k, v) for k, v in c.headers.items().mapping.items() if k.startswith('x-ra')):
    print(f"{key}: {value}")
print(f"---\n{c.parse().choices[0].message.content}")

Topic		Replies	Views
RPM rate limits at 100 when using assistants API API	3	1575	September 24, 2024
Keep getting 'rate-limit' error on Assistants API! Bugs api , assistants-api	11	339	April 29, 2025
Rate Limits for preview models? API gpt-4	11	4584	March 11, 2024
Hitting Rate Limits with Multiple Assistant Calls on Tier 5 Account Bugs assistants-api	16	2060	March 8, 2024
I don't know where where my tokens are being used. I think it is wrong API gpt-4 , api , gpt-4-turbo	12	1996	December 10, 2023

RPM rate limits at 60 when using gpt-4 with Assistant API

Related topics