GPT-4o Assistant Thread Length Limit?

_j · May 21, 2024, 2:56pm

With assistants, the threads may grow long (perhaps hitting some artificial limit mentioned earlier). However, the entire conversation is not necessarily sent.

If you were to use gpt-3.5-turbo-0613 with 4k token context, the Assistants would consider how much is needed for a reply, perhaps a stock 2000 tokens, and then only pass the number of recent chat turns from the thread that could fit in the token budget using that model.

Choose a 128k context model, and that thread can again switch to sending many more messages.

OpenAI has not provided a token limit you can set yourself for messages, but they provided a number of past turns option. That is the truncation option I mentioned earlier, which you can read about in the Assistants API documentation.

truncation_strategy is the technical tool to limit the number of messages passed to the AI.

Using file_search is where you have little control, up to 20 chunks of 800+ tokens each can be placed in the AI context. The assistants framework isn’t taking into account the tier rate maximum that can be sent to the model when composing instructions, messages, tools, retrieval or search context, and trying to bill you $0.50+ per call. The only mitigation is to make your documents tiny so the chunks are tiny.

Topic		Replies	Views
How exactly do you get charged for using the API for assistants? API assistants-api	33	7779	November 27, 2023
Assistant Thread limitations API gpt-4 , api , assistants-api	5	1229	July 30, 2024
Assistants API - Thread Tokens vs Thread Management API	3	220	January 9, 2025
Token consumption: Prompt tokens exponentially increase when using Threads (Assistants) API assistants-api	8	793	September 5, 2024
Gpt-4-1106-preview in Playground needs some fixes API gpt-4 , playground	24	17337	February 5, 2024

GPT-4o Assistant Thread Length Limit?

Related topics