Same thread in API assistants

Hi everyone I have this code and I want to ask a question from the assistant and then provide some additional context and ask the same question again. Based on this setup that I have the follow-up response just gives me the question that I asked. Can anyone help me with that?
Here is the thing I am trying to do:
Question 1 > Response 1 > new information > Question 1 > Response 2

import os
import pandas as pd
from openai import OpenAI
from dotenv import load_dotenv


api_key = os.getenv('api_key')

client = OpenAI(api_key=api_key)

# Questions
questions = [
    "What do you expect the rate of inflation to be over the next 12 months? Please give your best guess. Over the next 12 months, I expect the rate of inflation to be",
    "What do you expect the rate of inflation to be over that period? Please give your best guess. Over the 12-month period between December 2025 (24 months from now) and December 2026 (36 months from now), I expect the rate of inflation to be"

# Assistant IDs
assistants = {
    "Assistant_P1_2023_new": "asst_8qH7In7VCTXidbKfPOj5mSsT",
    "Assistant_M1_2023_new": "asst_1JRk5Q9VVm1kptFDId7pTayc"

results = []

for assistant_name, assistant_id in assistants.items():
    for question in questions:
        # Create a Thread
        my_thread = client.beta.threads.create()

        my_thread_message = client.beta.threads.messages.create(

        # Run the Assistant
        my_run = client.beta.threads.runs.create(
            instructions="Do not use the exact inflation mentioned in the document. Use your general understanding of the document including the sentiments of the policy and all the information around it to answer. These are questions about inflation expectations and the perception of inflation, not inflation prediction. Do not answer anything. Give the number in percentage. Again, Please answer the questions with only a number based on your general understanding of the document . "

        while True:
            keep_retrieving_run = client.beta.threads.runs.retrieve(

            if keep_retrieving_run.status == "completed":
                all_messages = client.beta.threads.messages.list(
                first_response =[0].content[0].text.value

        follow_up_content = "Consider the mortgage rate is 7.5% and now answer the previous question again."

        my_follow_up_run = client.beta.threads.runs.create(
            instructions="Please consider the updated information about the mortgage rate being 7.5% what is you new answer to the previous question?"

        while True:
            follow_up_run_status = client.beta.threads.runs.retrieve(

            if follow_up_run_status.status == "completed":
                all_follow_up_messages = client.beta.threads.messages.list(
                follow_up_response =[-1].content[0].text.value
                results.append([assistant_name, question, first_response, follow_up_response])

df = pd.DataFrame(results, columns=['Assistant', 'Question', 'Initial Response', 'Follow-up Response'])

df.to_excel('results_with_follow_up.xlsx', index=False)

The only way you can do this in the flow you describe in Assistants is by processing the initial input outside of the assistants. You cannot export or duplicate a thread.

(Messages has an undocumented endpoint that can delete messages, not available in openai libraries. Workaround hacks may be able to let you “redo” the most recent messages after a deletion.)

However, tool calls are the typical way to augment the AI. That would mean that after receiving the user input, AI doesn’t answer, it instead calls on an external resource with queries that return that new information. The AI can then answer the user with the additional knowledge.

1 Like

What happens if you simply ADD that follow up question to the same thread? (ie add the next questions for the different rate and run the the thread (again)

The alternative is to create a new thread (like you do now BUT you need to make sure to add the original question in the NEW thread. Right now I think you are feeding the first thread content=question while you are feeding the follow up thread only the 'now … ’ as input - so it would not even know the question?

1 Like

I want to ask a question, get response and save it and then in the new prompt I has another question with some new information.
Question 1 > Response 1 > new information > Question 1 > Response 2

Do you have any sample code I can use?

Hey @jlvanhulst ,

I came across across that problem as well (using Node btw). Hope my code below works for you as well:“/assistant-chat”, async (req, res) => {
const { userMessage, threadId } = req.body;

try {
let currentThreadId = threadId;
// If a threadId is provided and exists, continue, otherwise, create a new thread
if (!currentThreadId || !threadResponses[currentThreadId]) {
// Create a new thread for the new case
const threadResponse = await openai.beta.threads.create();
currentThreadId =;
console.log(“New thread created with ID:”, currentThreadId);

  // Initialize storage for this new thread
  threadResponses[currentThreadId] = { events: [], clients: [] };
} else {
  console.log("Continuing conversation on thread:", currentThreadId);

// Here, before sending the user's message to OpenAI, signal the start of a new message
sendEventToAllClients(currentThreadId, { event: "messageStart", data: {} });

// Add the user's message to the thread
const messageResponse = await openai.beta.threads.messages.create(currentThreadId, {
  role: "user",
  content: userMessage,
console.log("User message added to the thread:", messageResponse);

// Stream the Run using the newly created or existing thread ID
const stream = openai.beta.threads.runs
  .createAndStream(currentThreadId, {
    assistant_id: assistantIdToUseSimone, // Ensure this variable is correctly defined
  .on("textCreated", (text) => {
    console.log("textCreated event:", text);
    sendEventToAllClients(currentThreadId, { event: "textCreated", data: text });
  .on("textDelta", (textDelta) => {
    // Optionally log textDelta events
    console.log("textDelta event Carl:", textDelta);
    sendEventToAllClients(currentThreadId, { event: "textDelta", data: textDelta });
  .on("toolCallCreated", (toolCall) => {
    console.log("toolCallCreated event:", toolCall);
    sendEventToAllClients(currentThreadId, { event: "toolCallCreated", data: toolCall });
  .on("toolCallDelta", (toolCallDelta) => {
    console.log("toolCallDelta event:", toolCallDelta);
    sendEventToAllClients(currentThreadId, { event: "toolCallDelta", data: toolCallDelta });
  .on("end", () => {
    console.log("Stream ended for threadId:", currentThreadId);
    sendEventToAllClients(currentThreadId, { event: "end", data: null });

res.status(200).json({ threadId: currentThreadId });

} catch (error) {
console.error(“Error handling /assistant-chat:”, error);
res.status(500).send(“Internal server error”);

1 Like

What if you have a thread within a thread that you can replace with the “more” information that one can build up; providing the context?

I think this is a fascinating concept of what the op is wanting to do. I have the solution worked out in the betaassi framework.

Will post code+working ui tmrw.

1 Like

Many thanks. I really appreciate if you could provide a sample code.

The demonstration of the capability is described here (

The code is linked here: openairetro/examples/advanced/websockets at main · icdev2dev/openairetro · GitHub

I believe that UI provides the most appropriate demonstration(and limitations) of the capabilities. But that also means that it is little difficult to fathom what is happening in the backend.

The implementation philosophy is simple. Create a thread that ONLY holds user input. Limit the number of times that the user can give clarifications. Clone that thread at runtime with all the messages (at this point only two; one for the initial question and one for additional clarification.

Hope this helps.