How many tokens is the size of the context window in Open AI Assistant?

I can’t find in the documentation where this figure is. I’m developing a chatbot where threads can get very long, so if the token window size is small, I need a solution to ensure past messages that drop out of the window don’t get “lost”… so feeding them back in somehow (maybe a summary then making a new thread seeded with this summary? Sounds tedious, though…)

Thanks

If you are actually using the API’s “assistant”, which is an agent framework, then the conversation is managed for you, without possibility of outside controls.

The context length varies by model, from 4k with gpt-3.5-turbo-0613, to 128k with gpt-4-turbo-0125.

If you use AI models by the chat completion endpoint, then you are the one in control of what conversation history is sent with every new chat turn, and can use techniques by having a background AI summarize some of the oldest chat to continue the illusion of memory a bit longer.

Is there more information of this on how it’s managed? Is it indeed a sliding window whereby old messages are simply dropped out?

If on chat completions, and you send too large an input (along with output reservation with max_tokens) for the model, you will simply get an error. Therefore you must do your own token-counting and decide on your own technique and budget for limiting chat size.

Assistants simply drops old chat… and makes sure you are spending the maximum on every API call.

Yeah I’m talking about assistant. Are the standard practice ways to ensure old messages aren’t dropped off? Reinjecting summaries?

When using Assistants, you delegate all management to OpenAI. You can, and must, keep adding messages to a thread, but when actually run, you have no idea what threshold of cutoff will be used, and you have no way to edit, alter, or replace messages (and unseen tool history), with OpenAI purposefully disabling a method discovered to at least delete old messages.

Reason #160 why Assistants is unsuitable for anyone that can code (that is, if Assistants didn’t need a whole bunch more code anyway).