I’m trying to understand how tokens are calculated in the OpenAI API, as I’m encountering unexpected token counts. For example, I sent a basic query like “Hello,” but the token count for the request was unexpectedly high—898 tokens—even without a user message.
This makes me wonder:
Are the prompt instructions included in the token count?
If so, why does the total token count not match my own calculations? For instance, in my case, the prompt instructions should only account for around 200 tokens.
Could you clarify how tokens are calculated in detail and what factors might contribute to these discrepancies?
The instructions placed into the AI input for internal tools such as file search, code interpreter, multi-function output, and your functions as tools are quite lengthy.
That is likely what you are experiencing with question-0 use of assistants with high input token usage. I see the AI talking about “files”, where file search (or its alter ego “myfiles_browser”) is an especially huge tool description.
As I discussed, it is the lengthy internal descriptions of tools that are provided to the Assistant AI model that are being measured and shown.
"You have a file search tool… " that continues for many paragraphs gives you built-in expense when file search on vector stores is enabled.
Ask a question where the AI performs a document search instead of just saying hello, and the expense of the response will jump from 800 tokens to 10000+ input tokens…
The “run” parameter for the maximum number of token chunks returned was not being respected the last time I checked the Assistants in the Playground UI.