How can a single instance of gpt handle multiple users at the same time?

Hello,

I have a prompt created via the OpenAI API and I’d like to deploy it.
I was wondering how multiple users trying to access it simultaneously is handled. Is it something I should be worried of ? Can the API handle this situation ? Otherwise, what solutions exist for this situation (queue, job scheduler etc) ?

Thank you for your help!

if you are talking about assistants using assistants api, typically you should assign one thread per user.

No, I’m using a ‘gpt-3.5-turbo-0125’ with the regular api.

Yeah, gpt-3.5-turbo-0125 is the model. Since you tagged this post as assistants-api I am assuming you are using Assistants Api. Or do you mean Chat Completions API?

Right, it’s the Chat Completions API

if you are going to use chat completions api, you need to manage the context messages yourself. what this means is, you will need something in the backend to manage the messages per user. that is the minimum thing you need to do.

I understand but my questions is can my model be accessed through the OpenAI API by multiple user at the time ? Will OpenAI be able to answer to many users -each of its own history- in parallel ?

Completions and Chat Completions are completely stateless. They only “see” what you send them - you have to send the whole conversation each time to get an inference. And since you’re constructing the whole conversation object each time, it’s up to you what you put in it (or leave out).