Benchmarking Tools Again - SO SLOW

GoldenJoe · October 30, 2024, 7:24am

I’m having really bad response times in my Assistant app, so I’ve built some new benchmarks with more detail to try and isolate where in the API I’m losing so much time. Randomly my app will hang for as long as 30 seconds. A bit about my app and methodology:

I have about 20 tool functions. As a best practice, I preprocess the user’s query with a Chat Completion that narrows down the list of possible tools using the tool name only. This takes about 0.5s on average and returns 1-2 tool names. Then, I submit the user’s query to the current Thread, passing in only the one or two tool definitions. This saves tokens and should save on time according to the documentation. I’m specifically using an asynchronous streaming assistant with GPT 4o-mini.

To simplify testing, I built a small app that has a single tool and runs it using three methods:

Chat Completion:

first completion is the query, which returns the tool_calls
second completion includes the tool output and returns the final result

Assistant w Polling:

fresh thread
first run (create_and_poll) is the query, which returns the tool_calls
second run (submit_tool_outputs_and_poll) returns the final result

Assistant w Stream:

fresh thread
uses an AssistantEventHandler
first run (stream) is the query, handler takes over until it finishes (stream.until_done())
handler onEvent is triggered, tools are processed and submitted to a second run (submit_tool_outputs_stream) which returns the final result

Even with an empty thread, both Assistant methods use over 3x the tokens of a Chat Completion. They are also half the speed of a Chat Completion at best. Assistant w Polling regularly produces inconsistent results even with this simplified and optimized test.

And yet, when I build an Assistant in a Playground with the same prompt and function definition, it’s near-instant. Is this perhaps a problem with the Python API?

This has been a problem for a while and is a total dealbreaker for real world use. We can’t expect users to sit around for ten seconds or worse. I would really appreciate it if someone from the dev team could address it and provide some guidance. Thanks.

ollie2 · January 14, 2025, 5:50am

My testing, which I’ve posted extensive results in the below thread, also gave me the impression assistants isn’t suitable for production use. I can’t have my users waiting 10-35 seconds for a response.

Topic		Replies	Views
Assistants API Performance API api , assistants-api	12	2994	December 29, 2025
Assistant API Performance is Very Slow API plugin-development , api	11	5502	December 29, 2025
Assistant/Thread Model Stress Test: Concerning Results [See inside] API	19	580	December 29, 2025
Function calling through the assistant API incessant polling API assistants-api	4	210	December 29, 2025
Assistant API request is taking very long response time API assistants-api	3	424	December 29, 2025

Benchmarking Tools Again - SO SLOW

Related topics