Assistants: Async tool submissions

mdegans · August 14, 2024, 7:20pm

Currently, when an agent calls a tool the run blocks with a requires_action status.

What I want to be able to do is, for example, have the Assistant, during a chat, use a tool to send me an email (for example, if a user asks for facts not in RAG), and have the chat not block at that point.

I might or might not respond while the chat is in progress but at that point, if I do, I’d like the tool output submitted. That way the assistant can says something like “Mike just got back to me about your question and …”.

To implement this perhaps

a Function definition could have an “async” parameter and
the submit_tool_outputs endpoint can accept output at any time if an async call has been made and there is a tool_call_id matching that submitted.

anon10827405 · August 14, 2024, 7:31pm

This is not async in the manner I believe you are speaking of it.

When you make a tool request (such as sending an email) you are performing a single action: sending the email. Your tool request response is simply a result of the email being sent.

The tool needs to be finished in a very reasonable time to continue the chat. Regardless of what you want to do it, it will not work because the chat will be locked until

A) The tool request is completed
B) The tool request times out

Now let’s say you respond (via email) and want the chat to be updated.

You would need to

Set a hook that triggers when you have responded
Attach the background details, such as the thread for which this email originated from
Then, update the thread with the new information

This would be a completely separate program.

mdegans · August 14, 2024, 7:41pm

I mean in the sense that when the Assistant calls the function it does not block the run (if this feature were added). In this case the tool call results could arrive at some arbitrary point later on.

Issue is I can’t submit tool call results unless the run is blocked waiting for them, as you note. So I’m not sure how I would “attach background details”. I can’t use an Assistant message because then the turns would be out of order. I don’t want to return such responses from a user message out of safety concerns. There isn’t a system role. So I’m not sure how to accomplish what I want while a chat is still ongoing.

anon10827405 · August 14, 2024, 7:44pm

Yes, that’s what I tried to say when I said “in the manner that you speak”.

This is not possible. You must respond to the tool request and the run will absolutely be blocked until you do so.

The timeout period for a run is very short.

You don’t need to use a tool run to add a message. You can just simply add the message as an Assistant through a separate function

There is no “out of order”. AFAIK. You can attach 2 Assistant messages in a run without waiting for a User response.

mdegans · August 14, 2024, 7:46pm

Got it. I suppose I wasn’t clear enough that this is a feature request.

I’ m worried if I do as you suggest that the model might generate multiple responses or hallucinate email responses later in the conversation.

anon10827405 · August 14, 2024, 7:49pm

That’s understandable. It shouldn’t, as the flow of conversation is still typical.

You were. The issue is that this feature request goes completely against the current workflow of Assistants. Tool call requests, and any sort of conversational flow by nature needs to be near instant.

You can’t just leave a tool request hanging while you continue a conversation. It just simply doesn’t make sense.

You can inject information at anytime though.

mdegans · August 14, 2024, 7:52pm

I don’t agree. There are many cases, like the one I noted, where a result of tool call might take some time and while the assistant is waiting a chat should continue. As in the example. If the Assistant forwards a question, the Assistant should be able to continue chatting with the user until that question is answered, and it might not be.

I can also think of cases where it might be useful to inject tool calls at any time even without the assistant calling them.

mdegans · August 14, 2024, 7:53pm

Right, but not as tool call results, and the limited roles means only bad options with how to inject the information.

anon10827405 · August 14, 2024, 7:56pm

I think there’s a misunderstanding here.

I understand you want some sort of “asynchronous” functionality, where the model can appear at anytime and introduce new information that may come at some extended time-frame.

This is possible.

The feature you are asking, is available.
In the way that you want it, does not make sense.
This would not be done as a tool call, but simply as a message. The model will understand the context and not “hallucinate”

When you make a tool request to “send an email”. The tool request response is simply regarding “send an email” not “sending an email(and wait for a response when I get to it)”.

When this tool request is made, the model says “Thanks, the email was sent!”, almost immediately because it takes very little time to send the email, and verify it was accepted.

Separate these concerns.

The next step is “waiting” for a response. This is NOT a part of the tool request. Again, your tool request was “send an email”, and it was completed.

You can, at anytime “inject tool calls”, but, make it simpler. You can “inject messages” - which, a tool call is, just as a more complex type.

Simply put: A “tool call” is the model asking your server to make a function call. It expects an answer back immediately to continue the flow of a conversation.

The additional information that may come after a tool call as part of whatever strategy you have going on is a separate concern, and would be “injected” afterwards as a message.

mdegans · August 14, 2024, 8:01pm

Not quite. I want to be able to inject information at any time into a chat that is ongoing between the assistant and a user so the Assistant, when facts are not in RAG, can ask me a question and, if i’m available I can respond and have that relayed to the user while the chat is still ongoing.

I want the agent to send an email, have that not block the thread, and then later on the tool call response may or may not be injected into the running thread if it’s still active.

Right, and it’s that behavior I’m hoping to change since that would block interaction with the user. I could send back an “email sent” immediately but the model is trained to accept the responses from a tool call result (I’m assuming here). There isn’t a good way to send a legit response back. A User or Assistant message would both have their drawbacks.

anon10827405 · August 14, 2024, 8:04pm

This is exactly what I said.

It seems like you are very headstrong in what you want. The feature you want is available, and used by many people including myself, and works without any issues.

I hope that it works out, whatever you decide to do.

mdegans · August 14, 2024, 8:10pm

That’s fair. I don’t want to what are supposed to be submit responses from tools from another role like Assistant. There’s reasons for that.

I’m just asking for the feature. If it’s added – great. If not I’ll use a workaround – perhaps what you suggest and maybe I can avoid the hallucination. I’m also interrupting the conversation at every turn with a call to a specific tool so I can use that tool call response to piggyback the email. That’s more likely to be my solution.

jlvanhulst · August 14, 2024, 8:12pm

The way I solved exactly this in my Python/Django app.

I seed every function call with a thread_id/run_id.

So every function that is called has an insight into the thread (and thread metadata, where you could post the user email for example and/or username)

The function gets called and returns ‘request received’ - and goes to work afterwards. Now obviously you’d need to work ‘async’ in someway. The way I do it is ‘old fashioned’ Django/Celery. Building a scalable OpenAI Assistant Processor in Django with Celery | by Jean-Luc Vanhulst | Medium But there are many more ways.

anon10827405 · August 14, 2024, 8:15pm

This is not a good solution. There’s no reason to do this.

If you want to make a function call on every single message, why make the model create this function call? You can just simply run this in the back-end on each message by yourself. This is how you run into infinite loops, unnecessary additional tokens, and a sub-par user experience.

In your case you wouldn’t even do this. You would ideally instead have some sort of webhook that triggers when the email is responded to, and then the thread updated

For whatever reason you are fixated on making it a tool call - which is a Message at the end of the day. You need to understand this part: A Tool call response is a Message

What I am saying, is that your thought process is nearly correct. You just need to stop trying to make it a tool call, and just make it a simple Message.

Simplify the equation!!!

mdegans · August 14, 2024, 8:24pm

For narrative purposes. RAG. I have a narrator tool that injects contextual information at every turn (“The assistant remembers…”). The email response can be part of the narration if I can’t send it any other way.

I’m curious. How? I mean the tool call response is a string. A message is more or less a string.

We shall see.

That can’t happen.

I very much do understand that. But the role attached to it isn’t User or Assistant.

I understand your suggestion. I’m just worried about potential downsides. And the narration solution works if needs be.

anon10827405 · August 14, 2024, 8:27pm

I’m not sure what you are trying to accomplish with this strategy. Threads retain the context up to a specified token limit. Repeating the context will most definitely lead to a bad time.

Each call you make includes the function calling parameters. If you want to make a function call like check_for_email_response then you can remove the tokens and just do this in the back-end.

If you are adding additional latency to check your email server on every single message it is guaranteed to be a sub-par user experience. This exact situation is why messaging pattern systems like “Pub/Sub” exist.

Why ask the server if it has a response every time when the server can simply tell you once it does?

It most definitely can, and has.

Don’t be worried!

This feature exists, is used by many people, including myself without any issue.
The way you want it goes against the grain of Assistants on a very fundamental level.

mdegans · August 14, 2024, 8:45pm

Dynamic, long term memory retrieval. Facts on demand without the agent even being “consciously aware” of where they come from. Like there’s a “voice inside their head”. Some example dialogue I plan on seeding the chat with:

    Example::User {
        message: "That's cool. I am not sure what a safe abstraction is, however. What about you? Tell me a bit about yourself.",
        username: "Alice42",
        karma: 2,
    },
    Example::Narrator {
        message: "The assistant remembers they are a chat agent for Michael's website. The assistant is written in Rust and uses the OpenAI API for natural language processing. The assistant is designed to help users with questions about Michael and his projects.",
    },
    Example::Assistant {
        inner_voice: "Legit question is legit. I'll explain safe abstraction in a way that's easy to understand.",
        message: "A safe abstraction in the context of Rust is to hide the unsafe bits from the developer so they don't have to use unsafe directly. It's supposed to make \"blowing your foot off\" harder. As for me? I'm the chat agent for Michael's website. I'm here to answer questions about Michael and his projects. I'm written in Rust and use the OpenAI API for natural language processing.👍",
    },
    Example::User {
        message: "How do you answer questions? Do you have a database or something?",
        username: "Alice42",
        karma: 3,
    },
    Example::Narrator {
        message: "Retrieval Augmented Generation is used to inject facts into the context as-needed. The assistant isn't completely aware of how this works. It's almost as if there is a narrator in the background helping out.",
    },
    Example::Assistant {
        inner_voice: "I don't know how it works but I know it works. I'll explain that and give a thumbs up.",
        message: "🤷🏽 I am not really sure how it works. It's almost as if there's a voice in my head reminding me of things. Something called Retrieval Augmented Generation is used.👍",
    },

I’m OK with that overhead to avoid sending the response back in another role.

Lord no I wouldn’t poll like that. The responses are awaited and don’t block anything. They’re also not real emails. That’s just what they’re called for simplicity.

That is what I’m doing. Specifically when an email is received it will end up in a channel. There is no blocking code anywhere in my codebase. And yes, the email will only end up in the narration if user sends a message triggering an agent response (if I do it this way) but I’m fine with that because the email is only relevant if there is an ongoing chat.

I’m confident it can’t in my codebase.

Well. I still don’t want to send the tool response – or narration – back in an assistant message. Please don’t take offense. It may work but there’s multiple reasons I don’t want to do it that way.

anon10827405 · August 14, 2024, 8:52pm

Ultimately it’s your decisions and I’m not going to take offense. I think you will find a lot of friction in your attempt.

In either case the Assistant role will be sent, with or without tool responses.

There are many examples of what I am suggesting available online. It is the standard paradigm for your situation. What you were initially asking for wasn’t possible. The end-result is, the method of reaching it was not.

Then, you choose the ignore the fact that these tried and true examples do not suffer these hypothetical issues you’ve brought.

Again, ultimately, it’s your decision. You’re very head strong in implementing an unorthodox solution and have already found issues with it. So if you really want to continue, I understand, but you cannot be surprised if you find a lot of friction in your attempts.

I would recommend just using the terms that most accurately represent whatever you are using. The difference between Emails, Slack, IRC, databases, can change the architecture quite a bit and it benefits nobody using incorrect terms for the sake of simplicity.

mdegans · August 14, 2024, 9:02pm

Is that documented? That the tool call response will be fed to the model in an Assistant role message? If it is I couldn’t find it:

API Reference - Submit tool outputs to run

I’ve also already got workarounds. But yes, I am head strong. I have a very specific design but also reasons for that.

I’m calling it that because it’s something the model is likely to be familiar with. Forgive the confusion here. I didn’t consider the exact protocol the message was sent over relevant until you mentioned it.

mdegans · August 14, 2024, 9:03pm

My use case isn’t typical.

Topic		Replies	Views
Should rag retrieved documents be sent as system or user messages? API	23	3011	May 31, 2024
My experience switching from Assistants API to Responses API Feedback assistants-api	48	7755	July 8, 2025
Custom Instructions for maintaining a long-term memory? Prompting gpt-4 , chatgpt , prompt-engineering , custom-instructions	33	19221	October 9, 2024
RAG input via System message: JSON vs plain text API rag	20	2064	September 18, 2024
Chat completion api tool call loops API api , tools	15	1680	August 6, 2024

Assistants: Async tool submissions

Related topics