Function calling through the assistant API incessant polling

Hello,

I’ve been building an application using the assistant API and I’m currently looking at optimizing the speed at which we’re able to get a response to the user - unfortunately I’m regularly losing 15s+ polling the API after submitting the tool response.

I’ve followed the instructions here exactly (https://platform.openai.com/docs/assistants/tools/function-calling) except for defining the assistants in the OpenAI dashboard instead of within my code.

Here’s an example, the POST request is the output of my function:

2024-11-24 19:16:20,729 INFO     HTTP Request: POST https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE/submit_tool_outputs "HTTP/2 200 OK"
2024-11-24 19:16:20,966 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:21,767 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:22,493 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:23,273 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:24,062 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:24,911 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:25,657 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:26,449 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:27,267 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:28,087 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:29,315 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:30,135 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:30,953 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:31,725 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:32,900 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"
2024-11-24 19:16:33,718 INFO     HTTP Request: GET https://api.openai.com/v1/threads/thread_KBxCEydBEwzgAu7WamCT1qK3/runs/run_N7JSwDFjFP7oJMgtAZZ7teiE "HTTP/2 200 OK"

That’s a 13s wait just for the API to acknowledge receiving the output and adding it to the context of the assistant, I assume.

Is there anyway in which I can speed this up? Would using the chat completions API be a better option for me at this point while the assistant API is still in beta?

Thanks!

That is likely the time for the AI to generate the response.

Add instructions “your maximum response length is a single short sentence.”

Instead of polling, you can use streaming, and receive the output as it is being generated, for an appearance of more responsiveness.

There is no response for it to generate, it only needs to return exactly what my function has sent it.
I am using streaming:

async with client.beta.threads.runs.stream(
            thread_id=thread_id,
            assistant_id=frontman_id,
            event_handler=event_handler
        ) as stream:

@edwinarbus Hi Edwin - can someone from OpenAI provide some feedback here please?