Assistants API run needs webhook

When you execute a run command, the run is queued and according to the documentation:
"You can periodically retrieve the Run to check on its status to see if it has moved to completed ".

This is a good candidate for webhook. Instead of pooling results, it would be better for the system to inform me (via a webhook) when the task has completed.
Thanks,
Jonathan Ekwempu

18 Likes

I think webhooks would be great for this! Until we have a better solution than pooling - do you currently have any strategy for pooling results? I planned on doing it every few seconds, but I don’t want to keep users waiting for the answer more than necessary.

Also, I can’t seem to find this anywhere, are Assistant API calls free? Does it matter (pricing-wise) if I pool it every second or every 10 seconds?

Can’t the model be instructed to do that now via a function?

1 Like

I don’t see how, do you maybe have an example?

I have not done function calling myself so far as I’ve not had the need. But, this would be my approach to do it:

According to the Assistants API overview, this API " the API allows you to define custom function signatures, with similar behavior as our function calling feature." OpenAI Platform

So, in my API Instructions, I would include a prompt to “please send notification of your response” or something to that effect. I would then included a function which will send the current thread ID to some webhook.

That webhook would be designed to simply send a polling request to the thread id which should, theoretically, return the assistant’s response.

Of course, someone who has actually used functions will be able to provide a more detailed example.

Hi @SomebodySysop, thanks for your feedback. I don’t think using functions is a suitable solution for this kind of problem. You can even see from your response that the “webhook should be designed to do some form of polling”. That really negates the whole purpose of a webhook. Ideally, you “push” to a webhook. A webhook does not really request data. Any solution that requires polling is not right in my opinion.

1 Like

Oh, yes, you’re right. Just push the whole response back. Why didn’t I think of that.

Also, it’s not functions apparently with assistant api, it’s “actions”. Custom GPTs with Custom Actions - #3 by andrew.demerchant

Ok, I’m pretty sure that there is no way to call a webhook. The documentation around calling a function calls a function in your program, not a function that exists on the OpenAI side. I had a bit of fun with GPTs in deciding this was the case. I made an “Assistants API Knower” GPT that has all of the current Assistants API documentation and a zip of the openai-python 1.3 code dumped into it. You have to constantly remind it to refer to its knowledge base, but it does a good job of helping write Assistants API code.

Ugh, I can’t add links to a post, so I can’t share it. I’ll tweet about it – @johnnylambada

1 Like

As mentioned on https://platform.openai.com/docs/assistants/how-it-works/runs-and-run-steps, “… You can check the status of the run each time you retrieve the object to determine what your application should do next.”

Background: We are building a support assistant, leveraging Retrieval tool & Assistant API in the backend. We will be using a Bot from one of the Chat Solution 3rd parties and configure our platform Bot in it. The Bot will receive, end users’ queries and will then leverage the Assistant API (with Knowledge Retrieval tool, we’re adding a knowledge base file with Assistant for augmentation) before ‘asynchronously’ responding back to end user message.

Question: How to design a performant and scalable solution where our backend server will receive end user’s queries and it asynchronously passes it to Assistant API for responses - we’re looking for suggestions for how best to implement this polling mechanism since it seems we need to poll for every run separately.

Currently, the only way seem to be a manual polling:

  1. Put the promt/message from the end user on an exisitng or new thread (beware cost issues with extending the life of a Thread for too long) using https://platform.openai.com/docs/api-reference/messages/createMessage

  2. Keep polling (https://platform.openai.com/docs/api-reference/runs/getRun) for the Run’s status to be one of the following: “requires_action”, “cancelled”, “failed”, “completed” or “expired”. You can set the polling iterval to something sensible (200ms-1000ms), depending on how much extra time you are willing to let your user wait. Responses typically take several seconds to generate anyway, so 1000ms polling interval should be fine. You can do all this in a separate thread, to avoid it blocking the end user.

  3. Process the retrieved non-running Run

3a) If status is “requires_action”, it wants you to run one or more tool (function). Apply the parameter payload to the functions it wants you to call, and return the results (https://platform.openai.com/docs/api-reference/runs/submitToolOutputs) when all functions have been processed. This returns a new Run, process this (previous step). This could possible lead to recursive calls to your function that processes Runs.

3b) If status is “completed”, retrieve the latest (messages) from the thread and present to the user. You can utilize the before and after parameters and comparing the times in the Message objects to determine which messages to ask for (eg. what to show to the end-user, as the list of messages will also contain previous shown messages).

3c) Handle statuses “cancelled”, “failed” and “expired” by giving proper feedback to your end-user.

If it is too costly to create new threads for each end-user, you can have one separate poller-thread (or a thread-pool) where you submit each Run to, and it takes care of the polling. It can provide the list of messages back to the main thread when completed, in the same way as the “one thread per user” implementation does.

But yeah, webhooks would be a better solution…

5 Likes

thanks @kristianv for your suggestions! Yeah, looking forward to the webhook release!

i don’t think pooling works like this. i could be wrong. the Run will ‘run’ depending on server side resources - the Assistant will add the (pooled) message or messages to the thread one at a time depending on the existing User messages in the thread. Once the assistant has finished the evaluation of the thread, the Run is set to ‘completed’. I only ever get multiple Assistant messages per Run if I the Assistant decided to use code interpreter.

you don’t create a new Run after updating the run with tool output when status = ‘requires_action’. You use the same run_id from previous step.

run = client.beta.threads.runs.submit_tool_outputs(
thread_id=thread.id,
run_id=run.id,
tool_outputs=[
{
“tool_call_id”: tool_call[0].id,
“output”:
}
]
)

Perhaps using websocket or http2 sse (Server Sent Event) would be more appropriate

That’s exactly what I was thinking!! Everything would be smoother with a webhook! :slight_smile:

I can’t wait for webhooks to be implemented for that. As an intermediate remedium for this all the polling stuff, I wrote a small library that does polling on OpenAI Threads API, so I don’t need to rewrite this code each time I use Assistants API. It’s on npm already and it’s called @tmlc/openai-polling

Why aren’t they offering a webhook? Polling consumes more resources on both ends.