I have an app whereby there’s one “master” assistant and a dedicated thread for each user of the app. Obviously it’s possible that many users can be using the app at the same time, and so the assistant needs to be able to run on each users thread at the same time so there’s no delays.
Is this currently possible // is this currently how it works? IE. if I invoke an assistant on different threads at the same time do these executions occur at the same time (meaning parallelism occurs)? Or is this not the case?
If it’s not the case, and that each thread invocation can only occur one at a time, then this is bad, and so I’m guessing I’ll have to assign 1 assistant per user to achieve parallelism?
Threads are created per user, or per session. You can run unlimited number of threads under your master assistant, and it runs concurrently as separate threads. Hope that answers your questions (its not bad)!
Tip: you can also change the behaviour of each of the Threads at run time, without it affecting each other. Do you need more information on this?
It totally depends on how you have configured your assistant, but the basic premise is that you can now add more context to each of the threads based on how the conversation progresses!
As per this it seems it’s possible. My architecture is set up like this. I’m trusting requests made at the same time to separate threads are indeed done so concurrently.