Assistant API performance

Assistant is AMAZING !! But currently it takes around 30s to get response from assistant that includes one or two tool calls (including planning and a confirmation message after).

Any news on when that will be improved ? Does having OpenAI enterprise account help with that ?

We’re using Assistant API V2, and gpt-4-turbo, (gpt-4o had lower accuracy, and didn’t improve response time that much)


I gave up on assistants api due to performance. I now use it only for thread management. Recent updates allow adding messages as an assistant so i basically call the completion api for function calling and once i receive the tools output and execute the function, the data response is passed to completion api with streaming and response from streaming is saved back to assisntants thread messages as an assistant role

We were thinking about the somewhat similar approach. In your experience, have you noticed any difference in function calling and specially handling errors and fixing those based on response feedback ? we heavily rely on providing feedback to tool calls to make it correct it’s response and can’t go without that.

@amoradian -In function calling you can handle errors returning the error message back to
the LLM to submit tool outputs and poll if required. It understands the context and re-prompts user with feedback. Once the user enters the permitted value, it calls the tool again using context from the previous conversation. If you are using AzureOpen AI content filter toggle might cause slow outputs and members of the forum reported lightning fast responses once they had dealt with it. Hope this helps, cheers!

Notes: Good prompt engineering with proper description of tools with appropriate temp and top p values should do it. You could use a lower value close to 0 for top-p to retrieve higher probability tokens w.r.t context.