How to remember chat in openai's assistant api?

I have this below working code but it is not able to remember the previous chat. I think everytime a new thread is being created which leads to new chat all the time. How can I make it to remember the chat and make the common thread until I need to

from openai import OpenAI
import time

ASSISTANT_ID = "asst_xxxxxxxxxxxxxxxx"

client = OpenAI(
    api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
)

def submit_message(assistant_id, thread, user_message):
    client.beta.threads.messages.create(
        thread_id=thread.id, role="user", content=user_message
    )
    return client.beta.threads.runs.create(
        thread_id=thread.id,
        assistant_id=assistant_id,
    )


def get_response(thread):
    return client.beta.threads.messages.list(thread_id=thread.id, order="asc")

def create_thread_and_run(user_input):
    thread = client.beta.threads.create()
    run = submit_message(ASSISTANT_ID, thread, user_input)
    return thread, run

# Pretty printing helper
def pretty_print(messages):
    print("# Messages")
    for m in messages:
        print(f"{m.role}: {m.content[0].text.value}")
    print()

# Waiting in a loop
def wait_on_run(run, thread):
    while run.status == "queued" or run.status == "in_progress":
        run = client.beta.threads.runs.retrieve(
            thread_id=thread.id,
            run_id=run.id,
        )
        time.sleep(0.5)
    return run

# Emulating concurrent user requests
thread1, run1 = create_thread_and_run(
    "Hello!"
)
# Wait for Run 1
run1 = wait_on_run(run1, thread1)
pretty_print(get_response(thread1))
1 Like

Yes, that is by design. Think of thread as a session. You don’t create thread every time you send message to the same session. When the user sends the first message, you create the thread. You then store the thread id somewhere. Then when the user sends another message, you check for the stored thread id, check if it is viable, and use it to add your message.

1 Like

Here is how I retrieve a thread and continue the conversation.
You do not have to create a new thread every time, you can just retrieve the thread you want as long as you remember the thread ID


import openai
from openai import OpenAI
import textwrap
from datetime import datetime

client = OpenAI()

# Replace 'your_thread_id' with the actual thread ID you want to retrieve.
your_thread_id = "thread_HGooiVXvPOt4lZv1aPbU8pqd"
#your_thread_id = "thread_U9NWEPwDgkqz2HzanRI6keqC"

# Retrieve the thread details
thread_details = client.beta.threads.retrieve(thread_id=your_thread_id)

# Print the thread details
print("Thread Details:")
print(thread_details)

# Retrieve the list of messages in the thread
thread_messages = client.beta.threads.messages.list(thread_id=your_thread_id)

# Print the messages in the thread
print("\nThread Messages:")
print()
for message in thread_messages.data:
    for content in message.content:
        # Check if the content is a text message
        if content.type == 'text':
            wrapped_text = textwrap.fill(content.text.value, width=120)
            print(f"Role: {message.role}, Content: {wrapped_text}\n")
        # Check if the content is an image file
        elif content.type == 'file':
            print(f"Role: {message.role}, File URL: {content.file.url}\n")
            # Here you can add code to download the image or display it in a suitable manner

        # Convert Unix timestamp to a human-readable format
        human_readable_timestamp = datetime.utcfromtimestamp(message.created_at).strftime('%Y-%m-%d %H:%M:%S UTC')

        print(f"Message ID: {message.id}")
        print(f"Timestamp: {message.created_at} ({human_readable_timestamp})")
        print("-" * 50)  # Separator line for readability


You can also parse the assistant response to openAI tts

from IPython.display import Audio
from openai import OpenAI
from datetime import datetime

# Initialize the OpenAI client
client = OpenAI()

# Retrieve the messages in the thread (assuming you've already defined `your_thread_id`)
thread_messages = client.beta.threads.messages.list(thread_id=your_thread_id)

# Find the first message from the assistant
assistant_response = None
for message in thread_messages.data:
    if message.role == "assistant" and message.content[0].type == "text":
        assistant_response = message.content[0].text.value
        break

if assistant_response:
    # Now, we'll use OpenAI TTS to convert this message to speech
    response = client.audio.speech.create(
        model="tts-1",
        voice="onyx",
        speed="0.75",
        input=assistant_response  # Use the assistant's message content as input for TTS
    )

    # Save the audio to a file
    file_name = f"assistant_response_{message.id}.mp3"
    response.stream_to_file(file_name)
else:
    print("No assistant response found in the thread.")

# Print the first message
if first_message:
    print("Assistant message:", message.content[0].text.value)
else:
    print("No message found.")

# Use IPython's Audio class to play the audio within the notebook
Audio(file_name)

3 Likes

Just note that the messages elements of the threads can only be retained for 30 days, so that will need managing if you plan of keeping them for longer.

1 Like

Ok that is good to know that msg retention is only 30 days.

Is there any link you’ve got to “30 days” regarding threads? I scanned the API reference and documentation and tutorials without even finding the word “days”.

It might be possible to “keepalive” by sending a “hi” on day 29, unless policy is worded differently.

And if a thread conversation is gone but you saved it, there is no putting it back into an assistant…

1 Like

Yea, it caught me by surprise too, it’s here

Data retention policy by endpoint. (I’m assuming GDPR and related compliance)

https://platform.openai.com/docs/models/how-we-use-your-data

image

1 Like

I think all of those are talking not about the utility provided to you the developer, but rather how long non-training API conversation and other model interactions is kept for purposes of abuse monitoring and such.

Compared to the 100GB of documents, various chats that linger in a database are basically free for OpenAI.

That the tokenizer could be exploited by a quadratic attack with certain sequences? “don’t care, compute for us is basically free” (consider thousands of GPU servers with CPU cores going along for the ride)

Thanks for replying everyone, I got it with the help of all the answers provided by you all!
Thanks! I can’t mark any one as solution because all of them collectively contributed to the solution.