I’m using the Assistants API with the code_interpreter tool for my application, and starting today am seeing a lot of errors around “too many requests” in the chat thread itself.
These seem to be due to OpenAI-internal rate limits around file access from Assistants, but i see no documentation about those limits, or how to avoid these kinds of errors. Also, the request itself is not 429-ing, so its quite difficult to implement the kind of retry logic that would normally get around this class of issue.
Screenshot from thread:
Yesterday we were seeing persistent memory errors when the Assistant thread was trying to read from a file (community posts on that issue here and here). Are these limits enforced per-account or due to global load? Is OpenAI going to work to scale these systems? Are others encountering this?