Heh heh, now this is a fun topic.
Opening the pandora’s box for parallelism is going to end up being more complicated than you think if that’s your goal.
The API calls themselves are async (using asyncio I believe), so the API calls can be called in parallel. That wouldn’t be the danger here.
Honestly, there’s no better way to know if this works for your use case until you try it out for yourself. It’s at that point you’ll be able to tell whether or not the API will allow what you’re doing (my guess is it should).
The real danger is that true, genuine multithreading / parallelism doesn’t necessarily work the way you think if you’re building this in python. In fact, “real” parallelism is not natively possible at all because of what’s called the Global Interpreter Lock (the GIL) in python. It’s possible in other languages, but the vast majority of people are building in python, hence the assumption here.
I can already see people debating me on this because python can achieve equivalent results for use cases with high i/o throughput (like OAI’s API), but trying to call it parallelism requires an “um actually” here I can’t help but to clarify lol.
That being said, to answer your question requires some context about how you expect the assistant to be called and from where. It also may not be necessary to get into the nitty gritty of it, depending on what you’re expecting to build. For most people, it is not going to matter that you start two functions within ms of each other to start coroutines vs simultaneously on different threads. What is going to matter is that your await() function doesn’t block the execution of the rest of the code while it waits for the request to finish.
If your concern is more about handling user requests concurrently, that’s a “big picture” problem, and is where client/server relationships come into play. This is what people mean when they talk about things like “load balancing” and stuff like that. You would be in charge of accepting async messages in your code, and, well, handling them and directing where those requests should go in the rest of your code lol.
Finally, solving these problems, asking these questions, and executing solutions is going to be 10000x easier if you just assign 1 assistant to a user. It’s not going to suddenly allow multithreading to be a thing, but it will make it super easy to direct things around in your program if you do . This is partly why I recommended this in a previous topic of yours I believe. It allows you to focus more on the big picture problems of your future architecture than getting (chat) threads to match up right.
*NOTE: threads in parallelism contexts refers to CPU threads, not chat threads.