How does assistant api thread defy token limit?

crickman · February 20, 2024, 11:59pm

I ran an experiment in which I introduced some seed information dispersed within the first 27 messages and then continued the thread on a topic with no overlap to the seed information.

After over 10,000 tokens raw tokens from the seed information (“raw” means only the message content…no other overhead considered…so on the conservative side), a user question was posed regarding that information:

Info1
Several unrelated messages
Info2
Several unrelated messages
Info3
Several unrelated messages
Info4
10k tokens of unrelated messages
Question to summarize “info”
Assistant: …

Based on other posts (such as “Max number of tokens a Thread can use equal the Context Length of the used model?”), I’d expect the seed information to have aged outside of the maximum context (8k tokens for gpt-4 0613).

What I have observed is that this seed information that exists past the 8k token limit is perfectly summarized.

How might this be occurring?

crickman · February 28, 2024, 8:04pm

Received response from Open AI engineering team:

This is part of our truncation logic that’s built into our Assistants API (from our docs:

“Once the size of the Messages in a Thread exceeds the context window of the model, the Thread will attempt to include as many messages as possible that fit in the context window and drop the oldest messages. Note that this truncation strategy will evolve over time to become more sophisticated”

Therefore, some old messages will get retained as part of our truncation strategy and others will get dropped, similar to ChatGPT. Unfortunately, we can’t share more about this truncation strategy, so sorry about that!

This confirms that there is some secret sauce beyond a simple sliding window on the chat history.

anon10827405 · February 28, 2024, 9:13pm

Nice insight! Yeah it seemed weird. Numerous people have shown ChatGPT retaining very old “seed” messages when passing the context length, while forgetting others.

For ChatGPT not sure. It could be that they perform a similarity search on the message along with the conversation once the token limit is met to determine which content to remove and which to retain. Would be nice to get some more insights here.

Thanks for sharing

crickman · February 28, 2024, 10:40pm

I was wondering same about similiarity search. I suppose that would imply creating embeddings for each message. At this point Open AI has refused to provide any implementation details on their “truncation strategy”.

Thanks for the feedback!

Topic		Replies	Views
How many tokens is the size of the context window in Open AI Assistant? API	5	2863	April 8, 2024
Token consumption: Prompt tokens exponentially increase when using Threads (Assistants) API assistants-api	8	455	September 5, 2024
How does ChatGPT have such massive token limit? API	12	32074	December 12, 2023
I wish that when using the GPT API, it would be possible to have a contextual conversation like chatGPT API	14	7080	December 18, 2023
Assistant's truncation/optimization of context window is not good! API gpt-4 , api , assistants , assistants-api	7	1694	April 4, 2024

How does assistant api thread defy token limit?

Related topics