Questions about OpenAI assistant API token limits?

Hello everyone, I’ve been testing OpenAI’s assistant API. I think it’s great, especially after using the assistant playground, but I have some questions about the token limits and consumption for inputs and outputs.

  1. Currently, the models I use for the assistant API are from the GPT-4 series, which mostly support a context window of 128,000 tokens.
    (If I upload about 20,000 tokens in batches, can the assistant remember the entire content?)

  2. Although the context window limit is 128,000 tokens, each input and output during a conversation with the AI is limited to 4096 tokens (right?).

When setting up the assistant, I need to input instructions and a system prompt to start the conversation. What is the difference between these two prompts? Which one is more important?

  1. Do the tokens used for the assistant’s instructions and system prompt count against the 4096 token limit for each conversation?
    If my instructions have 100 tokens and the system prompt has 150 tokens, how much will be deducted from the 4096 token limit for each conversation?

Additionally, if my assistant has an attached document, will tokens only be consumed when my input prompt includes instructions for the AI to refer to the attachment, or will tokens be consumed by default?

Here is my assistant code. Please take a look and let me know your thoughts. Thank you!

system_prompt = f"""
Due to the large size of the original conversation records, I will provide segmented content in parts. Each conversation segment is marked with "Q:" and "A:" to denote the question and answer. 
Please objectively summarize these conversation segments into narrative paragraphs in Traditional Chinese.
"""

# API connection
with open('key.txt', 'r') as f:
    OPENAI_API_KEY = f.readline().strip()

client = OpenAI(api_key=OPENAI_API_KEY)

# Load or create assistant
try:
    with open('assistant_id.txt', 'r') as f:
        assistant_id = f.readline().strip()
    print("Using existing assistant ID:", assistant_id)
except FileNotFoundError:
    assistant = client.beta.assistants.create(
        name="Conversation Analysis",
        instructions="You are a senior assistant. Your task is to summarize the interview records I provide.",
        model="gpt-4o",  
    )
    assistant_id = assistant.id
    with open('assistant_id.txt', 'w') as f:
        f.write(assistant_id)
    print("Created new assistant ID:", assistant_id)

# Load or create conversation thread  
try:
    with open('thread_id.txt', 'r') as f:
        thread_id = f.readline().strip()
    print("Using existing thread ID:", thread_id)
except FileNotFoundError:
    thread = client.beta.threads.create()
    thread_id = thread.id
    with open('thread_id.txt', 'w') as f:
        f.write(thread_id)
    print("Created new thread ID:", thread_id)

# Send initial system prompt to conversation thread
client.beta.threads.messages.create(
    thread_id=thread_id,
    role="user",
    content=system_prompt
)

# Process each text segment in a loop
countm = 0
for segment in segments:
    countm += 1
    # Send message to conversation thread
    client.beta.threads.messages.create(
        thread_id=thread_id,
        role="user",
        content=segment  # Send current text segment
    )

    # Run assistant to process current segment
    run = client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id=assistant_id
    )

    # Check run result
    while True:
        run = client.beta.threads.runs.retrieve(
            thread_id=thread_id,
            run_id=run.id
        )
        if run.status == "completed":
            print(f"Run completed. Segment {countm}")
            break
        elif run.status == "failed":
            print("Run failed with error:", run.last_error)
            break
        time.sleep(2)

    messages = client.beta.threads.messages.list(
        thread_id=thread_id
    )
    message = messages.data[0].content[0].text.value
    print(message)