I’m on Tier-5 and using gpt-4o-mini through the Assistants API, but I keep getting RateLimitError: 429 even though my request volume is very low and should be far below Tier-5 limits.
openai.RateLimitError: Error code: 429 - {
‘error’: {
‘message’: “You’ve exceeded the rate limit, please slow down and try again later.”,
‘type’: ‘invalid_request_error’,
‘param’: None,
‘code’: ‘rate_limit_exceeded’
}
}
This happens when calling:
client.beta.threads.messages.create(thread_id=thread.id, role="user", content=q)
My concern:
Given that I’m on Tier-5 and using a lightweight model (gpt-4o-mini), I was expecting higher limits. But I’m still getting 429 errors from thread message creation.
Can someone clarify:
-
What specific limit is being hit? (RPM, TPM, message creation rate, or concurrency?)
-
Does the Assistants API have internal limits that aren’t documented?
-
Is there any backend throttling happening recently?
-
Any recommended way to avoid these 429s during long-running batch processing?
Thanks.