Eval runs started via the API can have some (many!) of their tests fail due to exceeding the rate limit. To be clear, these tests are scheduled and run by OpenAI on their timeline, the developer has no control as to the timing of when eval_run tests run. So OpenAI code is scheduling these eval_runs too fast and not handling retries correctly when the rate limit is exceeded.
Seems like a serious oversight… Anyone know if this is normal? This started happening today