Questions about OpenAI assistant API token limits?

jason123 · May 16, 2024, 6:43am

Hello everyone, I’ve been testing OpenAI’s assistant API. I think it’s great, especially after using the assistant playground, but I have some questions about the token limits and consumption for inputs and outputs.

Currently, the models I use for the assistant API are from the GPT-4 series, which mostly support a context window of 128,000 tokens.
(If I upload about 20,000 tokens in batches, can the assistant remember the entire content?)
Although the context window limit is 128,000 tokens, each input and output during a conversation with the AI is limited to 4096 tokens (right?).

When setting up the assistant, I need to input instructions and a system prompt to start the conversation. What is the difference between these two prompts? Which one is more important?

Do the tokens used for the assistant’s instructions and system prompt count against the 4096 token limit for each conversation?
If my instructions have 100 tokens and the system prompt has 150 tokens, how much will be deducted from the 4096 token limit for each conversation?

Additionally, if my assistant has an attached document, will tokens only be consumed when my input prompt includes instructions for the AI to refer to the attachment, or will tokens be consumed by default?

Here is my assistant code. Please take a look and let me know your thoughts. Thank you!

system_prompt = f"""
Due to the large size of the original conversation records, I will provide segmented content in parts. Each conversation segment is marked with "Q:" and "A:" to denote the question and answer. 
Please objectively summarize these conversation segments into narrative paragraphs in Traditional Chinese.
"""

# API connection
with open('key.txt', 'r') as f:
    OPENAI_API_KEY = f.readline().strip()

client = OpenAI(api_key=OPENAI_API_KEY)

# Load or create assistant
try:
    with open('assistant_id.txt', 'r') as f:
        assistant_id = f.readline().strip()
    print("Using existing assistant ID:", assistant_id)
except FileNotFoundError:
    assistant = client.beta.assistants.create(
        name="Conversation Analysis",
        instructions="You are a senior assistant. Your task is to summarize the interview records I provide.",
        model="gpt-4o",  
    )
    assistant_id = assistant.id
    with open('assistant_id.txt', 'w') as f:
        f.write(assistant_id)
    print("Created new assistant ID:", assistant_id)

# Load or create conversation thread  
try:
    with open('thread_id.txt', 'r') as f:
        thread_id = f.readline().strip()
    print("Using existing thread ID:", thread_id)
except FileNotFoundError:
    thread = client.beta.threads.create()
    thread_id = thread.id
    with open('thread_id.txt', 'w') as f:
        f.write(thread_id)
    print("Created new thread ID:", thread_id)

# Send initial system prompt to conversation thread
client.beta.threads.messages.create(
    thread_id=thread_id,
    role="user",
    content=system_prompt
)

# Process each text segment in a loop
countm = 0
for segment in segments:
    countm += 1
    # Send message to conversation thread
    client.beta.threads.messages.create(
        thread_id=thread_id,
        role="user",
        content=segment  # Send current text segment
    )

    # Run assistant to process current segment
    run = client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id=assistant_id
    )

    # Check run result
    while True:
        run = client.beta.threads.runs.retrieve(
            thread_id=thread_id,
            run_id=run.id
        )
        if run.status == "completed":
            print(f"Run completed. Segment {countm}")
            break
        elif run.status == "failed":
            print("Run failed with error:", run.last_error)
            break
        time.sleep(2)

    messages = client.beta.threads.messages.list(
        thread_id=thread_id
    )
    message = messages.data[0].content[0].text.value
    print(message)

Topic		Replies	Views
Questions about programming with the OpenAI Assistant API and how to invoke an existing assistant ID API api , assistants-api	0	445	May 2, 2024
Questions and doubts about using the Assistant API and the purpose of thread.id? API api	2	710	May 7, 2024
Why are my context tokens used so quickly? API api	3	2593	January 5, 2024
How should a program be written to summarize a long text using an API, and what are the considerations regarding the maximum number of tokens allowed? API	2	1465	April 19, 2024
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	2364	May 24, 2024

Questions about OpenAI assistant API token limits?

Related topics