When you execute a run command, the run is queued and according to the documentation:
"You can periodically retrieve the Run to check on its status to see if it has moved to completed ".
This is a good candidate for webhook. Instead of pooling results, it would be better for the system to inform me (via a webhook) when the task has completed.
Thanks,
Jonathan Ekwempu
I think webhooks would be great for this! Until we have a better solution than pooling - do you currently have any strategy for pooling results? I planned on doing it every few seconds, but I donât want to keep users waiting for the answer more than necessary.
Also, I canât seem to find this anywhere, are Assistant API calls free? Does it matter (pricing-wise) if I pool it every second or every 10 seconds?
I have not done function calling myself so far as Iâve not had the need. But, this would be my approach to do it:
According to the Assistants API overview, this API " the API allows you to define custom function signatures, with similar behavior as our function calling feature." OpenAI Platform
So, in my API Instructions, I would include a prompt to âplease send notification of your responseâ or something to that effect. I would then included a function which will send the current thread ID to some webhook.
That webhook would be designed to simply send a polling request to the thread id which should, theoretically, return the assistantâs response.
Of course, someone who has actually used functions will be able to provide a more detailed example.
Hi @SomebodySysop, thanks for your feedback. I donât think using functions is a suitable solution for this kind of problem. You can even see from your response that the âwebhook should be designed to do some form of pollingâ. That really negates the whole purpose of a webhook. Ideally, you âpushâ to a webhook. A webhook does not really request data. Any solution that requires polling is not right in my opinion.
Ok, Iâm pretty sure that there is no way to call a webhook. The documentation around calling a function calls a function in your program, not a function that exists on the OpenAI side. I had a bit of fun with GPTs in deciding this was the case. I made an âAssistants API Knowerâ GPT that has all of the current Assistants API documentation and a zip of the openai-python 1.3 code dumped into it. You have to constantly remind it to refer to its knowledge base, but it does a good job of helping write Assistants API code.
Ugh, I canât add links to a post, so I canât share it. Iâll tweet about it â @johnnylambada
Background: We are building a support assistant, leveraging Retrieval tool & Assistant API in the backend. We will be using a Bot from one of the Chat Solution 3rd parties and configure our platform Bot in it. The Bot will receive, end usersâ queries and will then leverage the Assistant API (with Knowledge Retrieval tool, weâre adding a knowledge base file with Assistant for augmentation) before âasynchronouslyâ responding back to end user message.
Question: How to design a performant and scalable solution where our backend server will receive end userâs queries and it asynchronously passes it to Assistant API for responses - weâre looking for suggestions for how best to implement this polling mechanism since it seems we need to poll for every run separately.
Keep polling (https://platform.openai.com/docs/api-reference/runs/getRun) for the Runâs status to be one of the following: ârequires_actionâ, âcancelledâ, âfailedâ, âcompletedâ or âexpiredâ. You can set the polling iterval to something sensible (200ms-1000ms), depending on how much extra time you are willing to let your user wait. Responses typically take several seconds to generate anyway, so 1000ms polling interval should be fine. You can do all this in a separate thread, to avoid it blocking the end user.
Process the retrieved non-running Run
3a) If status is ârequires_actionâ, it wants you to run one or more tool (function). Apply the parameter payload to the functions it wants you to call, and return the results (https://platform.openai.com/docs/api-reference/runs/submitToolOutputs) when all functions have been processed. This returns a new Run, process this (previous step). This could possible lead to recursive calls to your function that processes Runs.
3b) If status is âcompletedâ, retrieve the latest (messages) from the thread and present to the user. You can utilize the before and after parameters and comparing the times in the Message objects to determine which messages to ask for (eg. what to show to the end-user, as the list of messages will also contain previous shown messages).
3c) Handle statuses âcancelledâ, âfailedâ and âexpiredâ by giving proper feedback to your end-user.
If it is too costly to create new threads for each end-user, you can have one separate poller-thread (or a thread-pool) where you submit each Run to, and it takes care of the polling. It can provide the list of messages back to the main thread when completed, in the same way as the âone thread per userâ implementation does.
i donât think pooling works like this. i could be wrong. the Run will ârunâ depending on server side resources - the Assistant will add the (pooled) message or messages to the thread one at a time depending on the existing User messages in the thread. Once the assistant has finished the evaluation of the thread, the Run is set to âcompletedâ. I only ever get multiple Assistant messages per Run if I the Assistant decided to use code interpreter.
I canât wait for webhooks to be implemented for that. As an intermediate remedium for this all the polling stuff, I wrote a small library that does polling on OpenAI Threads API, so I donât need to rewrite this code each time I use Assistants API. Itâs on npm already and itâs called @tmlc/openai-polling