Messages Context - Per Thread or Per Assistant?

alex_s · November 14, 2023, 1:00pm

I’m creating an app using the Assistant’s API and multiple users will use that app. If I want to make sure that only user A’s messages are used as context for User A and only user B’s messages used as context for user B, do I need to create dedicated thread(s) for each user’s messages or dedicated assistants for each user? Obviously the latter would be a lot more expensive, as the Assistants pricing is per assistant.

I generally get the impression from the docs that context is limited to each thread.

But in the pricing section here, the docs say that multiple threads can receive knowledge from an Assistant, which makes it sound like all threads could be sharing the Assistant’s context -

Retrieval is priced at $0.20/GB per assistant per day. If your application stores 1GB of files for one day and passes it to two Assistants for the purpose of retrieval (e.g., customer-facing Assistant #1 and internal employee Assistant #2), you’ll be charged twice for this storage fee (2 * $0.20 per day). This fee does not vary with the number of end users and threads retrieving knowledge from a given assistant.

trenton.dambrowitz · November 14, 2023, 1:03pm

Dedicated threads should work fine, I have a streamlit page that multiple people use at once.

It just creates a new thread for each new session, so every time you open the page it makes a new thread. It’s rudimentary but it works for my purposes.

alex_s · November 14, 2023, 1:05pm

Thanks for the info, I had something similar in mind. Do you know whether the message’s context is specific to each thread though?

trenton.dambrowitz · November 14, 2023, 1:09pm

It should be, unless you code something funky.

import openai
import streamlit as st
import requests
import base64
from openai import OpenAI
client = OpenAI()

thread = client.beta.threads.create()

    def gpt4():

        if "openai_model" not in st.session_state:
            st.session_state["openai_model"] = "gpt-4-1106-preview"

        if "messages" not in st.session_state:
            st.session_state.messages = []

        # Create two columns
        col1, col2 = st.columns([4, 1])

        # Empty space in the first column
        col1.empty()

        # Add a button to clear the chat in the second column
        if col2.button('Clear Chat'):
            st.session_state.messages = []

        for message in st.session_state.messages:
            with st.chat_message(message["role"]):
                st.markdown(message["content"])

        if prompt := st.chat_input("What is up?"):
            st.session_state.messages.append({"role": "user", "content": prompt})
            with st.chat_message("user"):
                st.markdown(prompt)

            with st.chat_message("assistant"):
                message_placeholder = st.empty()
                full_response = ""
                for response in client.chat.completions.create(
                    model=st.session_state["openai_model"],
                    messages=[
                        {"role": m["role"], "content": m["content"]}
                        for m in st.session_state.messages
                    ],
                    stream=True,
                    max_tokens=4000,
                ):
                    


                    # Extract the message content using the proper JSON keys
                    #st.write(response)
                    # Access the first 'Choice' object in the 'choices' list.
                    choice = response.choices[0]  # 'choices' is a list, so you can use index access here.

                    # Access the 'ChoiceDelta' object from 'choice' which contains 'content' field.
                    choice_delta = choice.delta  # 'delta' is an attribute of 'Choice' object.

                    # Access the 'content' field from 'choice_delta'.
                    message_content = choice_delta.content  # 'content' is an attribute of 'ChoiceDelta' object.

                    # Append the extracted content to the 'full_response' string with a newline character
                    full_response += message_content if message_content is not None else ""

                    # Update the placeholder with the 'full_response' using Streamlit's markdown to render it
                    message_placeholder.markdown(full_response + "▌")



                message_placeholder.markdown(full_response)
            st.session_state.messages.append({"role": "assistant", "content": full_response})
            
        if st.button('Download Chat Log'):
            chat_log_str = "\n".join([f"{m['role']}: {m['content']}" for m in st.session_state.messages])
            buffer = get_text_file_buffer(chat_log_str)
            b64 = base64.b64encode(buffer.getvalue().encode()).decode()
            st.markdown(f'<a href="data:file/txt;base64,{b64}" download="chat_log.txt">Download Chat Log</a>', unsafe_allow_html=True)

This is basically what I do, not sure if you decipher my cryptic horrible code but it’s there if it helps.

smhoff256 · November 15, 2023, 9:12pm

This must be incredibly expensive!

trenton.dambrowitz · November 16, 2023, 8:06am

It really just depends on what I’m using it for. It’s dirt cheap if I’m just asking basic questions or having it help refine some code, but the cost ramps up when I give it 30k tokens as context because I need help on something niche!

ollie2 · March 3, 2024, 11:45pm

Did you find out the answer to this? I have the same question but there doesn’t seem to be a definitive answer here.

alex_s · March 4, 2024, 12:13pm

No I didn’t I’m afraid. I’m assuming that it’s per thread but it would be nice to know for sure

ollie2 · March 4, 2024, 10:46pm

Did that architecture of 1 “master” assistant and a thread per user end up working nicely for you? I’m about to do similar.

alex_s · March 5, 2024, 9:10am

Yes, it works well because it means you only need to customise the assistant’s knowledge, instructions, functions etc. once and then all users have access to those resources. I guess setting up an Assistant is the equivalent of creating a GPT but private.

Topic		Replies	Views
How exactly do you get charged for using the API for assistants? API assistants-api	33	5840	November 27, 2023
Assistants API context window? API gpt-4-turbo , assistants-api	2	2288	November 26, 2023
Under one assistant, is context shared between threads? API	4	730	March 4, 2024
Will creating more threads help avoid appending the conversation history? API gpt-4 , api	5	952	December 19, 2023
Do threads get more expensive over time? API assistants-api , cost	4	698	March 25, 2024

Messages Context - Per Thread or Per Assistant?

Related Topics