Context tokens in Assistant API

habibulloxon · January 4, 2024, 1:19pm

Currently I’m testing Assistant API and I have a question. As I understand, instructions given to Assistant is sent as context and these instructions are counted as context tokens, am I right?

_j · January 4, 2024, 4:13pm

The language of model usage was changed around a bit in the new usage page in November, with the release of assistants (and with that page showing less detailed information).

Like the terms “GPT” and “Assistant” already used for other purposes, our ability to refer to things clearly is also now conflated by the new use of “context” to refer to yet another thing.

Pricing Page	Usage page	Completions API object
Input	Context tokens	prompt_tokens
Output	Generated tokens	completion_tokens

That alone should help clarify the multiple terms for the same thing.

Assistants also have non-transparent usage, where, for example, you have an entire “thread” conversation that must be sent to an AI model as an input for each question. That input and new output the AI writes can be repeated multiple times and grow as the assistants also makes multiple internal function calls to retrieval, code interpreter, or the developer’s own API tool functions - all to finally form one response to be read by the end user.

Uses input tokens:

Your provided instructions
OpenAI-provided instructions of assistants
Your provided tool specifications
OpenAI internal undisclosed tool specifications of assistants
Conversation History from thread (that you budget or assistants sends at max)
Files from thread and Files attached to assistant, inserted by RAG automation
Files from thread and Files attached to assistant, retrieved by function
Code interpreter results
Past AI function language and results added to conversation thread
…

Uses output tokens:

The AI generating language for all those internal purposes besides writing to the user

No per-run token usage is provided to you by assistants API, only the daily bar graphs of a web page.

When making direct calls to API models through the completions endpoints, token usage is either received or directly calculable. You see all the tokens you sent and received in those individual calls (except for slight obfuscation of counted/billed by tools/function overhead).

AI models are memoryless; they must be loaded with prompt information each call before they can generate based on that, including instructions (system messages) or earlier chat.

So “context” generally: additional information we humans need to know to understand how to answer; also all the information the AI needed to know every single model call – loaded into a memory area called “context window” where answers are also formed.

Hopefully that covers all facets of usage and billing you might encounter.

ziper.rom1 · February 20, 2024, 10:13am

So based on your explanations, the Context Tokens usage in a Thread grows at each interaction as the AI models is memoryless and each interaction add context to the Thread ?

ContextUsage(t+1) = MIN(ContextUsage(t), MAX_CONTEXT_LENGTH) + (userInput(t+1) + functionCall(t+1) + ...)

Meaning that if the cost of a Thread Context Tokens reach 1$, all subsequent conversation calls will cost at least 1$ ?

EDIT

The answers are yes.

I read this thread that answer all my questions. Got a case in a debugging session with no Context Token limitation in a Thread with gpt-4-turbo-preview model that burned all my credit (20$) because subsequent Run calls were charged more than 1$ each …

This charge mechanism should be documented somewhere in the OpenAI doc in details, all my app workflow needs to be reworked considering this Thread growing context mechanism.

I need to think about solutions to reduce a Thread context, like summarizing the Thread context when reaching a certain limit and create a new Thread with this summarized information as a pre-prompt history context.

Topic		Replies	Views
Does `context token` including the uploaded file in Assistant messages? API assistants-pricing	4	1284	March 26, 2024
How can I calculate the context token in Assistant API? Community assistants , assistants-api	4	1996	March 20, 2024
Why are my context tokens used so quickly? API api	3	2856	January 5, 2024
Context tokens cost too much API api	3	1050	February 20, 2024
Max number of tokens a Thread can use equal the Context Length of the used model? API	3	816	December 1, 2023

Context tokens in Assistant API

Related topics