[Bug] Inconsistent or unclear rate limit errors in Assistant API

Needariley · June 20, 2025, 1:39am

Description

When hitting the rate limit with the Assistant API, the returned errors are inconsistent. Sometimes it gives a clear 429 Too Many Requests with a Retry-After header, but other times it returns generic server errors (500 or 502) without clear messaging or retry instructions.

Steps to Reproduce

Send rapid API requests (multiple threads or multiple runs per thread) to the same Assistant.
Observe the returned error codes and headers.
Note that the response is not always a standard 429 with Retry-After.

Expected Result

When the rate limit is exceeded:

Always return a clear 429 status.
Include an accurate Retry-After header.
Provide clear guidance in the error body.

Actual Result

Sometimes get a 500 or 502 with no useful detail.
Sometimes get a 429 but Retry-After is missing or inaccurate.

Impact

Hard to implement robust backoff and retry logic. Leads to unnecessary failed requests and degraded UX.

Environment

Assistant API
Model: GPT-4, GPT-4 Turbo, GPT-4o-mini
Observed June 2025

Additional Context

The issue happens more with high concurrency or multiple threads.

Suggested Priority

Medium - affects reliability and scaling.

Topic		Replies	Views
[Bug] Assistant API sometimes loses context in multi-step workflows Bugs context-elements , assistant	0	97	June 20, 2025
RateLimitError (429) on Tier-5 Account While Using GPT-4o-mini – Clarification Requested API rate-limit , assistants-api , api-rate-limits , api-rate-tiers , gpt-4o-mini	1	167	November 17, 2025
GPT-4 API - Confusing Ratelimit Headers API	2	2105	March 20, 2023
429 that should be a 500 in chat endpoint (GPT-4) API	6	1582	December 18, 2023
Getting rate limit error that specifies incorrect rate limit Bugs assistants-api , api-threads	5	573	June 28, 2024