Messages Context - Per Thread or Per Assistant?

I’m creating an app using the Assistant’s API and multiple users will use that app. If I want to make sure that only user A’s messages are used as context for User A and only user B’s messages used as context for user B, do I need to create dedicated thread(s) for each user’s messages or dedicated assistants for each user? Obviously the latter would be a lot more expensive, as the Assistants pricing is per assistant.

I generally get the impression from the docs that context is limited to each thread.

But in the pricing section here, the docs say that multiple threads can receive knowledge from an Assistant, which makes it sound like all threads could be sharing the Assistant’s context -

Retrieval is priced at $0.20/GB per assistant per day. If your application stores 1GB of files for one day and passes it to two Assistants for the purpose of retrieval (e.g., customer-facing Assistant #1 and internal employee Assistant #2), you’ll be charged twice for this storage fee (2 * $0.20 per day). This fee does not vary with the number of end users and threads retrieving knowledge from a given assistant.

Dedicated threads should work fine, I have a streamlit page that multiple people use at once.

It just creates a new thread for each new session, so every time you open the page it makes a new thread. It’s rudimentary but it works for my purposes.

1 Like

Thanks for the info, I had something similar in mind. Do you know whether the message’s context is specific to each thread though?

It should be, unless you code something funky.

import openai
import streamlit as st
import requests
import base64
from openai import OpenAI
client = OpenAI()

thread = client.beta.threads.create()

    def gpt4():

        if "openai_model" not in st.session_state:
            st.session_state["openai_model"] = "gpt-4-1106-preview"

        if "messages" not in st.session_state:
            st.session_state.messages = []

        # Create two columns
        col1, col2 = st.columns([4, 1])

        # Empty space in the first column
        col1.empty()

        # Add a button to clear the chat in the second column
        if col2.button('Clear Chat'):
            st.session_state.messages = []

        for message in st.session_state.messages:
            with st.chat_message(message["role"]):
                st.markdown(message["content"])

        if prompt := st.chat_input("What is up?"):
            st.session_state.messages.append({"role": "user", "content": prompt})
            with st.chat_message("user"):
                st.markdown(prompt)

            with st.chat_message("assistant"):
                message_placeholder = st.empty()
                full_response = ""
                for response in client.chat.completions.create(
                    model=st.session_state["openai_model"],
                    messages=[
                        {"role": m["role"], "content": m["content"]}
                        for m in st.session_state.messages
                    ],
                    stream=True,
                    max_tokens=4000,
                ):
                    


                    # Extract the message content using the proper JSON keys
                    #st.write(response)
                    # Access the first 'Choice' object in the 'choices' list.
                    choice = response.choices[0]  # 'choices' is a list, so you can use index access here.

                    # Access the 'ChoiceDelta' object from 'choice' which contains 'content' field.
                    choice_delta = choice.delta  # 'delta' is an attribute of 'Choice' object.

                    # Access the 'content' field from 'choice_delta'.
                    message_content = choice_delta.content  # 'content' is an attribute of 'ChoiceDelta' object.

                    # Append the extracted content to the 'full_response' string with a newline character
                    full_response += message_content if message_content is not None else ""

                    # Update the placeholder with the 'full_response' using Streamlit's markdown to render it
                    message_placeholder.markdown(full_response + "▌")



                message_placeholder.markdown(full_response)
            st.session_state.messages.append({"role": "assistant", "content": full_response})
            
        if st.button('Download Chat Log'):
            chat_log_str = "\n".join([f"{m['role']}: {m['content']}" for m in st.session_state.messages])
            buffer = get_text_file_buffer(chat_log_str)
            b64 = base64.b64encode(buffer.getvalue().encode()).decode()
            st.markdown(f'<a href="data:file/txt;base64,{b64}" download="chat_log.txt">Download Chat Log</a>', unsafe_allow_html=True)

This is basically what I do, not sure if you decipher my cryptic horrible code but it’s there if it helps.

2 Likes

This must be incredibly expensive!

1 Like

It really just depends on what I’m using it for. It’s dirt cheap if I’m just asking basic questions or having it help refine some code, but the cost ramps up when I give it 30k tokens as context because I need help on something niche!

Did you find out the answer to this? I have the same question but there doesn’t seem to be a definitive answer here.

No I didn’t I’m afraid. I’m assuming that it’s per thread but it would be nice to know for sure :sweat_smile:

Did that architecture of 1 “master” assistant and a thread per user end up working nicely for you? I’m about to do similar.

Yes, it works well because it means you only need to customise the assistant’s knowledge, instructions, functions etc. once and then all users have access to those resources. I guess setting up an Assistant is the equivalent of creating a GPT but private.