How many runs should be created in a context of one thread?

After creating the first message (by the user) in a thread, a run should be created, so is this the only run that should be created? or should a new run be created after every single message a user creates?

Not sure what you mean by “run”, but every question is a single API request yes. You’re always asking one more question at a time right?

in the API assistants, there is something called “run” that should be created to get a response from the model after sending a question, so I am asking if I have to create a new run after every question sent to the model (message added to a thread) to get a response? or

Yes, after making an API call to create a thread which you associate with your own customer as a chat when you receive back its ID, you make a new API call, to put a message into a thread and waiting for its success message also, you then have to make another API call to “run” to connect a thread id to an assistant id and set the process of the backend querying an AI model into motion, while you then wait and continue calling for more polling API calls to attempt to find out what the assistant is doing, or open an SSE subscription and wait for push data chunks and then find out if they should be displayed or if the AI is trying to invoke tools with your code instead of tools with its own internal code that have been filtered out from your observation.

Don’t “run” twice on a user input or you either get a thread that has been locked from your management and an error, or you have executed another AI inference that will again generate, trying to figure out the new output it will make after the assistant was the last message.

Or you can just send messages to chat completion and get the response as soon as the tokens are generated.

@_j My question is: which is the right scenario:

First scenario:

  1. create thread
  2. add a user message
  3. create run
  4. retrieve assistant messages using the “run id” of the previous step
  5. repeat from 2

second scenario:

  1. create thread
  2. add user message
  3. create run
  4. retrieve assistant messages using the “run id” of the previous step
  5. add a user message
  6. retrieve assistant messages using the same “run id” of step 3
  7. repeat from 5

Neither of those are quite correct.

The reason being, the answer from the AI is also placed into a thread, and thus you would retrieve the output as the latest message that appears in a thread.

It is the status of a run that you continue polling for to find out when the run is finished, or is in a status “needs more data”.

If you are receiving the new streaming output from a run, you’d start receiving the final output text directly when the AI has decided to write an assistant output meant for the user, and not need to poll.

There is documentation already written about the steps to take to receive a response back; neither of us need to write out the procedures again:

https://platform.openai.com/docs/assistants/overview?context=with-streaming

Thank you @_j and yes I read the docs and sorry for the confusion, the scenario mentioned in this doc:

  1. create a thread
  2. add a user message
  3. create a new run (I choose without streaming)
  4. check the run status, if it “completed” move to the next step
  5. retrieve a list of messages, if the “run id” is included then it will retrieve only messages associated with the “run” created in step 3
  6. save new messages in your DB
  7. repeat from 2 (this the step the docs miss and I am asking about)

Right?

Close. After a run is done, you don’t refer to the run any more. The thread and its messages is what you pursue:

if run.status == 'completed': 
  messages = client.beta.threads.messages.list(
    thread_id=thread.id
  )
  print(messages)

The documentation is poor in that you instead would want to pick ascending or descending messages, and limit the retrieval to one latest message (or you can get two if you want to confirm the user/assistant pairing of recent messages.

Then the point of Assistants is that a thread is your database of a conversation. You reflect correctly that it cannot be the only database.

If you have the additional database to track your customers, track their multiple conversations, track their usage, track their policy violations, recall conversation titles, etc., then the reason for putting the maintenance of a “chat” within unbudgeted threads of assistants quickly becomes elusive. You see the value in skipping over this convoluted assistant framework entirely and implementing your own “chat completions” chatbot.

The API reference say that the “run id” is an optional Query parameter when listing messages so the messages will be filtered according to this “run id”: https://platform.openai.com/docs/api-reference/messages/listMessages#messages-listmessages-run_id

I used it and the messages retrieved was oly the recent assistant messages.

Some times there are more than one assistant message in a single run, some messages of type “text” and others of type “image_file”.

So what I do now is:

  1. creating thread
  2. add a message
  3. create a run without streaming
  4. checking its status
  5. retrieving list messages filtered be “run id”
  6. storing these messages in my DB as an assistant messages.
1 Like