API assistant run failing with no error

Im using assistants and the run status says ‘failed’, but when i print run I dont see any errors. Here is what the run looks like: Run(id=‘run_gvIWhRWSWzLuJbq8eEqWVB2H’, assistant_id=‘asst_9vVVph6XgZz6Da8JQlnJa2X4’, cancelled_at=None, completed_at=None, created_at=1706120995, expires_at=1706121595, failed_at=None, file_ids=, instructions=‘Whats 9 + 21’, last_error=None, metadata={}, model=‘gpt-4-1106-preview’, object=‘thread.run’, required_action=None, started_at=None, status=‘queued’, thread_id=‘thread_V0W3NtXrRMIxvFhM4F3Z7sG6’, tools=[ToolAssistantToolsRetrieval(type=‘retrieval’)], usage=None).

Whats the issues? Thanks!

Note the status. Not “failed”. You have to continue to poll the status of a run until it changes from queued, in_progress, to completed. Then you can obtain the result from the thread.

got it, i hit a rate limit. If i hit a rate limit of 10,000 TPD for a gpt3.5-turbo, can i use gpt4 still? Or do the rate limits go across models?

Tier 1 rate limits:

gpt-4 500 10,000 10,000 -
gpt-4-1106-preview * 500 10,000 150,000 500,000
gpt-3.5-turbo 3,500 10,000 60,000 -

The models each have their own rate limit bins. However it was previously reported that going well over one of the “per minute” limits (as can be done when you don’t specify how big the output will be when making many requests) could also shut off the other models until the overage was refreshed downwards.

(my own rate limits are a bit too high to explore this any more without spending $5.)

keep in mind that you need to ‘sleep’ between polling in order to avoid hitting the limit. For assistants 1-2 seconds are appropriate. But at least something like .1 or you will hit the limit quickly