Do invocations of the same Assistant on multiple different threads at the same time run in parallel?

Are requests made at the same time to separate threads are indeed done so concurrently? Meaning the same assistant can operate 10,000 times at the same time (no queuing).

My architecture plan is 1 assistant, many (10,000+) threads for a chat app, and latency is a priority to reduce.

So I want to know if each of the runs that are induced at the same time do not affect each other. Meaning, the execution length of a thread run is not impacted by the same assistant executing a run on one or more other threads at the same time.

2 Likes

Indeed, according to the documentation, on the OpenAI side, the only restriction is the rate limits imposed on your organization . You can have as many runs created simultaneously as supported by the rate limits.

A run, which is initiated on a thread, consists of a series of automated API actions executed automatically by OpenAI based on the configuration when the run is created.

It is possible that there may be a very short time required to acquire a lock on the file_search function in the assistant’s vector store, if such a function is enabled on the Assistant.

2 Likes

There is a low and unmentioned limit on the number of API calls you can make to the assistants endpoint per minute though. So you can plan on knocking that usage down to about 1-2 uses of assistant run per second if you are streaming or not aggressively polling threads.

2 Likes