Description
When hitting the rate limit with the Assistant API, the returned errors are inconsistent. Sometimes it gives a clear 429 Too Many Requests
with a Retry-After
header, but other times it returns generic server errors (500 or 502) without clear messaging or retry instructions.
Steps to Reproduce
- Send rapid API requests (multiple threads or multiple runs per thread) to the same Assistant.
- Observe the returned error codes and headers.
- Note that the response is not always a standard
429
withRetry-After
.
Expected Result
When the rate limit is exceeded:
- Always return a clear
429
status. - Include an accurate
Retry-After
header. - Provide clear guidance in the error body.
Actual Result
- Sometimes get a
500
or502
with no useful detail. - Sometimes get a
429
butRetry-After
is missing or inaccurate.
Impact
Hard to implement robust backoff and retry logic. Leads to unnecessary failed requests and degraded UX.
Environment
- Assistant API
- Model: GPT-4, GPT-4 Turbo, GPT-4o-mini
- Observed June 2025
Additional Context
The issue happens more with high concurrency or multiple threads.
Suggested Priority
Medium - affects reliability and scaling.