Context tokens in Assistant API

So based on your explanations, the Context Tokens usage in a Thread grows at each interaction as the AI models is memoryless and each interaction add context to the Thread ?

ContextUsage(t+1) = MIN(ContextUsage(t), MAX_CONTEXT_LENGTH) + (userInput(t+1) + functionCall(t+1) + ...)

Meaning that if the cost of a Thread Context Tokens reach 1$, all subsequent conversation calls will cost at least 1$ ?

EDIT

The answers are yes.

I read this thread that answer all my questions. Got a case in a debugging session with no Context Token limitation in a Thread with gpt-4-turbo-preview model that burned all my credit (20$) because subsequent Run calls were charged more than 1$ each …

This charge mechanism should be documented somewhere in the OpenAI doc in details, all my app workflow needs to be reworked considering this Thread growing context mechanism.

I need to think about solutions to reduce a Thread context, like summarizing the Thread context when reaching a certain limit and create a new Thread with this summarized information as a pre-prompt history context.

1 Like