How many runs should be created in a context of one thread?

salemmo409 · April 7, 2024, 7:37pm

After creating the first message (by the user) in a thread, a run should be created, so is this the only run that should be created? or should a new run be created after every single message a user creates?

wclayf · April 8, 2024, 3:06am

Not sure what you mean by “run”, but every question is a single API request yes. You’re always asking one more question at a time right?

salemmo409 · April 8, 2024, 5:24am

in the API assistants, there is something called “run” that should be created to get a response from the model after sending a question, so I am asking if I have to create a new run after every question sent to the model (message added to a thread) to get a response? or

_j · April 8, 2024, 6:03am

Yes, after making an API call to create a thread which you associate with your own customer as a chat when you receive back its ID, you make a new API call, to put a message into a thread and waiting for its success message also, you then have to make another API call to “run” to connect a thread id to an assistant id and set the process of the backend querying an AI model into motion, while you then wait and continue calling for more polling API calls to attempt to find out what the assistant is doing, or open an SSE subscription and wait for push data chunks and then find out if they should be displayed or if the AI is trying to invoke tools with your code instead of tools with its own internal code that have been filtered out from your observation.

Don’t “run” twice on a user input or you either get a thread that has been locked from your management and an error, or you have executed another AI inference that will again generate, trying to figure out the new output it will make after the assistant was the last message.

Or you can just send messages to chat completion and get the response as soon as the tokens are generated.

salemmo409 · April 8, 2024, 6:27am

@_j My question is: which is the right scenario:

First scenario:

create thread
add a user message
create run
retrieve assistant messages using the “run id” of the previous step
repeat from 2

second scenario:

create thread
add user message
create run
retrieve assistant messages using the “run id” of the previous step
add a user message
retrieve assistant messages using the same “run id” of step 3
repeat from 5

_j · April 8, 2024, 6:53am

Neither of those are quite correct.

The reason being, the answer from the AI is also placed into a thread, and thus you would retrieve the output as the latest message that appears in a thread.

It is the status of a run that you continue polling for to find out when the run is finished, or is in a status “needs more data”.

If you are receiving the new streaming output from a run, you’d start receiving the final output text directly when the AI has decided to write an assistant output meant for the user, and not need to poll.

There is documentation already written about the steps to take to receive a response back; neither of us need to write out the procedures again:

https://platform.openai.com/docs/assistants/overview?context=with-streaming

salemmo409 · April 8, 2024, 7:16am

Thank you @_j and yes I read the docs and sorry for the confusion, the scenario mentioned in this doc:

create a thread
add a user message
create a new run (I choose without streaming)
check the run status, if it “completed” move to the next step
retrieve a list of messages, if the “run id” is included then it will retrieve only messages associated with the “run” created in step 3
save new messages in your DB
repeat from 2 (this the step the docs miss and I am asking about)

Right?

_j · April 8, 2024, 7:29am

Close. After a run is done, you don’t refer to the run any more. The thread and its messages is what you pursue:

if run.status == 'completed': 
  messages = client.beta.threads.messages.list(
    thread_id=thread.id
  )
  print(messages)

The documentation is poor in that you instead would want to pick ascending or descending messages, and limit the retrieval to one latest message (or you can get two if you want to confirm the user/assistant pairing of recent messages.

Then the point of Assistants is that a thread is your database of a conversation. You reflect correctly that it cannot be the only database.

If you have the additional database to track your customers, track their multiple conversations, track their usage, track their policy violations, recall conversation titles, etc., then the reason for putting the maintenance of a “chat” within unbudgeted threads of assistants quickly becomes elusive. You see the value in skipping over this convoluted assistant framework entirely and implementing your own “chat completions” chatbot.

salemmo409 · April 8, 2024, 8:13am

The API reference say that the “run id” is an optional Query parameter when listing messages so the messages will be filtered according to this “run id”: https://platform.openai.com/docs/api-reference/messages/listMessages#messages-listmessages-run_id

I used it and the messages retrieved was oly the recent assistant messages.

Some times there are more than one assistant message in a single run, some messages of type “text” and others of type “image_file”.

So what I do now is:

creating thread
add a message
create a run without streaming
checking its status
retrieving list messages filtered be “run id”
storing these messages in my DB as an assistant messages.

Topic		Replies	Views
Assistant API - messages and runs API	2	497	May 19, 2024
Difference between Creating a Run and Creating a Thread and Run API assistants-api	5	325	November 7, 2024
Retrieve only Assistant's response from the last run API	4	10596	January 23, 2024
Multiple text responses in single Run (Assistant, no streaming) API	4	1302	May 8, 2024
Error running thread: already has an active run API	4	2444	May 29, 2024

How many runs should be created in a context of one thread?

Related topics