Did Assistants API just get a big upgrade?

Looking through github and looks like we’ve gotten some context window management!

Also looks like the documentation around truncation strategies is in a form of partial update.

Anyone else confirm?

I’ve been able to confirm that I can now set a truncation_strategy for a run. If I understand it correctly this would address the issue of assistant threads expanding to the maximum size of the context window.

1 Like

so: messages[-n:]

Better than nothing.

I suppose you can put a dynamic slider in your chat that greys out the old turns in your UI appropriately. But you ultimately can’t know the upper length which is managed.

But then: how does it affect all the unseen internal tool calls and tool returns in a thread?

Is the count just on the initiation of user messages and what comes after? Or any number of messages, that can leave you with an assistant response without context…

and why not:

truncation_strategy: {"max_messages": number, "max_message_tokens": number, "truncate_at": [ "smaller" | "larger" ]}

1 Like

Between this and the max completion and max prompt tokens (https://platform.openai.com/docs/assistants/how-it-works/max-completion-and-max-prompt-tokens) for a run it’s moving in the the right direction. Definitely seems like the structure for payloads for truncation strategy could support your idea.