We keep hitting a rate limit when using the Assistant API with gpt-4. The error talks about a limit of 60 RPM, but we are tier 4, with an RPM of 10K!
We’re using the Python API bindings. Any ideas on why we are being so severely rate limted?
We keep hitting a rate limit when using the Assistant API with gpt-4. The error talks about a limit of 60 RPM, but we are tier 4, with an RPM of 10K!
We’re using the Python API bindings. Any ideas on why we are being so severely rate limted?
Regarding that limitation, I think it was placed only for the use of Assistant which consumes a huge amount of tokens at once…
We are facing this issue as well using the GPT-3.5-turbo model with Assistant API and the node.js bindings. I’ve tried using the OpenAI help to ask about it but keep getting directed to the Request Increase Rate Limits form that is now closed.
My organization for our Assistant is Usage Tier 1, so there should be enough. We need to know if we should switch away from using Assistant. Currently we are making a chatbot and as things are we couldn’t have more than 30 active users ask 2 questions within a minute before we hit this limit.
Food for thought: “gpt-4” by name is gpt-4-0613
, which is only trained on functions, not parallel tool calls. Retrieval in assistants needs one of the gpt-4-turbo-preview
models to operate correctly.
You can get a deep iterative loop of things to be done, or errors, without return going within Assistants. You can check the run steps to see what’s actually whirring inside and setting tokens on fire.
Here’s a quickie snippet for getting the rate limits applied to your model call and account, from response headers:
import openai
api = openai.Client(timeout=5)
params = {"model": "gpt-4-0613", "max_tokens": 10, "top_p":0.9,
"messages": [{"role": "user", "content": " I like robots like you"}]}
try:
c = api.chat.completions.with_raw_response.create(**params)
except Exception as e:
print(f"Error: {e}")
for key, value in sorted((k, v) for k, v in c.headers.items().mapping.items() if k.startswith('x-ra')):
print(f"{key}: {value}")
print(f"---\n{c.parse().choices[0].message.content}")