I’m having really bad response times in my Assistant app, so I’ve built some new benchmarks with more detail to try and isolate where in the API I’m losing so much time. Randomly my app will hang for as long as 30 seconds. A bit about my app and methodology:
I have about 20 tool functions. As a best practice, I preprocess the user’s query with a Chat Completion that narrows down the list of possible tools using the tool name only. This takes about 0.5s on average and returns 1-2 tool names. Then, I submit the user’s query to the current Thread, passing in only the one or two tool definitions. This saves tokens and should save on time according to the documentation. I’m specifically using an asynchronous streaming assistant with GPT 4o-mini.
To simplify testing, I built a small app that has a single tool and runs it using three methods:
Chat Completion:
- first completion is the query, which returns the tool_calls
- second completion includes the tool output and returns the final result
Assistant w Polling:
- fresh thread
- first run (create_and_poll) is the query, which returns the tool_calls
- second run (submit_tool_outputs_and_poll) returns the final result
Assistant w Stream:
- fresh thread
- uses an AssistantEventHandler
- first run (stream) is the query, handler takes over until it finishes (stream.until_done())
- handler onEvent is triggered, tools are processed and submitted to a second run (submit_tool_outputs_stream) which returns the final result
Even with an empty thread, both Assistant methods use over 3x the tokens of a Chat Completion. They are also half the speed of a Chat Completion at best. Assistant w Polling regularly produces inconsistent results even with this simplified and optimized test.
And yet, when I build an Assistant in a Playground with the same prompt and function definition, it’s near-instant. Is this perhaps a problem with the Python API?
This has been a problem for a while and is a total dealbreaker for real world use. We can’t expect users to sit around for ten seconds or worse. I would really appreciate it if someone from the dev team could address it and provide some guidance. Thanks.