Context tokens in Assistant API

The language of model usage was changed around a bit in the new usage page in November, with the release of assistants (and with that page showing less detailed information).

Like the terms “GPT” and “Assistant” already used for other purposes, our ability to refer to things clearly is also now conflated by the new use of “context” to refer to yet another thing.

Pricing Page Usage page Completions API object
Input Context tokens prompt_tokens
Output Generated tokens completion_tokens

That alone should help clarify the multiple terms for the same thing.


Assistants also have non-transparent usage, where, for example, you have an entire “thread” conversation that must be sent to an AI model as an input for each question. That input and new output the AI writes can be repeated multiple times and grow as the assistants also makes multiple internal function calls to retrieval, code interpreter, or the developer’s own API tool functions - all to finally form one response to be read by the end user.

Uses input tokens:

  • Your provided instructions
  • OpenAI-provided instructions of assistants
  • Your provided tool specifications
  • OpenAI internal undisclosed tool specifications of assistants
  • Conversation History from thread (that you budget or assistants sends at max)
  • Files from thread and Files attached to assistant, inserted by RAG automation
  • Files from thread and Files attached to assistant, retrieved by function
  • Code interpreter results
  • Past AI function language and results added to conversation thread

Uses output tokens:

  • The AI generating language for all those internal purposes besides writing to the user

No per-run token usage is provided to you by assistants API, only the daily bar graphs of a web page.

When making direct calls to API models through the completions endpoints, token usage is either received or directly calculable. You see all the tokens you sent and received in those individual calls (except for slight obfuscation of counted/billed by tools/function overhead).

AI models are memoryless; they must be loaded with prompt information each call before they can generate based on that, including instructions (system messages) or earlier chat.

So “context” generally: additional information we humans need to know to understand how to answer; also all the information the AI needed to know every single model call – loaded into a memory area called “context window” where answers are also formed.

Hopefully that covers all facets of usage and billing you might encounter.

5 Likes