Optimal Approach for Managing Chat History in a Multi-Assistant System

I have 2 assistants, both doing separate tasks and I do query intent classification based on user query and their chat history, based on the intent of the query an assistant is selected.
Both assistants have different custom functions, and function calls also get added to message history.
My question is what will be the best way to segregate the chat history, should it be a single thread with both assistant messages and all user queries in it, or should I keep it separate?
The concern is if I pass the tool call history or replies of one assistant to another will it start hallucinating?

1 Like

Ideally, you should have separate threads for individual users. Otherwise, the assistant will have to deal with messages from multiple users, which will degrade the quality of responses.

1 Like

Each user will have their chat threads, my question is should every user have 2 threads for 2 different assistants or a single thread, which both assistants are writing to?

1 Like

Assistants maintain context window.

To do so they truncate messages, and for these to remain relevant for a continued flow as expected, independent thread.id per assistant is ideal in my opinion.

1 Like

If I understand the original question correctly. So lets say I have 2 science tutoring assistants, the first one specialises in physics and the second one in chemistry. When I receive queries from users, I first determine whether the question is physics or chemistry related and then invoke the appropriate assistant.

If this is the case, there are a couple of approaches with varying Pros and Cons:

  1. Separate Threads - Separate Assistants: (Straight forward)

You can design your application in such a way that the User has to explicitly start either physics or chemistry chat and then use separates threads ran with their corresponding assistants.

  1. Same Thread - Separate Assistants: (Trickier solution)

This seems to be what you have been considering. In theory, OpenAI should properly handle having multiple assistants running the same thread according to their docs. I would assume that they have considered that assistants will include in their context responses from other assistants. You will have to try it out and see the nuances.

  1. Same Thread - Same Assistant: (Simple approach)

This would overload all the responsibilities on a single Assistant, but depending on how sophisticated your use case is, it might be a good and simple option. Some good prompt engineering might do the job.

Conclusion:

Without knowing all the specifics and constraints of your problem, its hard to tell which one is the best approach. You could spike out all the approaches and choose the one that works best for you.

When in doubt, start out with the simplest approach and then optimise once you know the problems. Approach 3 is the simplest, the other two approaches make sense as a way of optimising whatever issues you encounter with Approach 3.

2 Likes

A thread is conversation session between an Assistant and a user. Threads store Messages and automatically handle truncation to fit content into a model’s context.

There is no limit to the number of Messages you can store in a Thread. Once the size of the Messages exceeds the context window of the model, the Thread will attempt to include as many messages as possible that fit in the context window and drop the oldest messages.

Whether you should use the same thread for multiple assistants really depends on how you plan to implement the assistants and what their intended purposes are.

Here’s one important thing to know:

Thread locks

When a Run is in_progress and not in a terminal state, the Thread is locked. This means that:

  • New Messages cannot be added to the Thread.
  • New Runs cannot be created on the Thread.

This means that at a time, a user will only be able to use one assistant on the thread, no matter how many assistants have access to the thread.

This will also lead to dramatically increase the thread size and incur storage and retrieval costs.

Lastly, it could also lead to dilution of context.

1 Like

Thanks for the detailed answer.
I have tested the second approach, but this resulted in 1 assistant behaving like the other.
For the third approach, it is quite difficult to make a single system prompt. Since both assistants are doing completely different tasks.
For the first approach, need to figure out a way to handle the user’s context from one thread to another.

Hello,
I have been working on a chatbot for several weeks, following the approach of a thread and several assistants who share it. Specifically there are 3 assistants:

  • a “guide” who is the one who initiates the conversation and determines the user’s intention
  • two “specialists”, each with a very different objective.

Either of them can pass control to either of the other two, depending on the user’s intent.

I find this approach very useful, because all three share the context, thanks to using the same thread. In my case, it is normal that in a single session with my chatbot, a user needs both assistants. Since both of them make use of call functions that have some common parameters, if one assistant has already determined some of the common parameters, the other assistant does not need to ask the user again.

The downside is the growth of the number of tokens in the conversation, of course.

The other issue I’m struggling with is the correct identification of intent to make the assistant change correctly. Since assistant API allow much less control over responses than chat_completion API, it is very difficult to give precise instructions so that they always behave the same.

@Rafota How did you deal with this? My concept is pretty similar, creating an assistant that works as a proxy for sub-assistants, each of which is a function call. But the thread remains locked in “requires_action”, not letting me create another run with the sub-assistant.

Hello Andrés,

I apologyze for the delay.

Lately I have migrated my code from assistants API to chatcompletion API, but I remember that the state “requires_action” means that you need to execute the function call. Take a look to this information: API Reference - OpenAI API and please, tell me if that was your problem.

Hey @Rafota, no problem.

Yes I’m submitting the tools output. My point is how to manage this back-and-forth between the assistants on the same thread.

I’ve created a topic specifically for this Assistants API Multi Assistant Agentic Workflow

Thanks for the answer.

Hello Andrés,

my processing after receiving a “requires_action” status (note that this is a terminal status of the GPT run) is:

  • execute my function (or functions) indicated by the GPT run result in required_action.submit_tool_outputs.tool_calls,

  • call to GPT_CLIENT.beta.threads.runs.submit_tool_outputs, passing the result of my function to obtain a response from ChatGPT assistant, based on that result of my function

  • retrieve the message generated by the assistant and show it to the user

I haven’t found problems performing this processing


Concerning to the change of assistants, I use one additional function in every assistant, to report me when I must change the assistant. This function receives as an argument the intention of the user, and depending on that intention I select one or another assistant. You must instruct to every assistant in order to call this function.

When this function is required by GPT, I process it without calling to GPT_CLIENT.beta.threads.runs.submit_tool_outputs, nor sending any message to the user. Simply submit the last message of the user to the new assistant, using the only one thread that I have for that conversation.

I hope this can help you.

Btw: Do you speak spanish? I am spanish

Saludos.

I’m Brazilian.

Appreciate your detailed answer. However, I am unable to replicate what you described.

When you say that you proceeds without the submit_tool_outputs , does it mean that the run is in a ‘requires_action’ state, due to the function indicating the need to switch assistants, right?

With that being said, when I try to replicate via API, let’s say I have a function in an assistant that indicates the need to switch to another assistant send_to_X_agent, the 1st assistant will enter a requires_action state indicating the intent to switch. Then I execute the 2nd assistant in the same thread, but I get the error Thread thread_X already has an active run run_Y, due to the active run of the 1st assistant which is in requires_action state.

Sorry if I’m not making myself clear.

Hello Andres,

I have just retrieved my project where I was using the assistants API and revised that code.

I see that it is neccesary to cancel the active run (https://platform.openai.com/docs/api-reference/runs/cancelRun), if you are not going to call to GPT_CLIENT.beta.threads.runs.submit_tool_outputs

Sorry for the mistake


1 Like