I just removed file_search from the tools list, and still get 11000 tokens.
In my initial example, there was no file_search_call in the output. See the list of output items I got. The number is the message length in characters.
You’re using function calling, which also expands the prompt.
All features outside of the system message and prompt use additional tokens. It’s not accurate to copy and paste the code that prepare your request into a tokenizer.
The tokens counted isn’t the JSON of a request like you pasted into the tokenizer. It is the language seen by AI.
The cached tokens indicates you are reusing a previous ID and previous chat state, where you pay for all the chat turns seen before, including the tool calls.
Reset the chat. See the token count drop.
Do not use the Responses’ server-side state by response ID reuse, unless you like paying for massive chat cost run-up possible on 1M context input models. OpenAI gives something unsuitable for untrusted users who aren’t paying the bill - manage your own chat history length in coordination with the caching discount and expiry period.
How can I reset the chat?
By the way, I am not storing responses on the server. I manually manage chat history.
I have the following in my API call payload:
Reset, meaning just abandon the chat session, don’t reuse any previous response ID, equivalent to “start a new chat” as a user button (or what the user must do if you enforce a maximum number of turns as your expected service).
If you are in control of all the messages you send, by not reusing a previous ID, then you likely are in complete observation and control of everything you are resending as input messages.
All the input you send is billed again, even if continuing a chat session. To reduce your costs, you reduce the length of history you preserve.
If you are making quick successive calls, at actual chat speed, then you may receive a cache discount of 50% or 75% on that part of input from the start that was identical to a previous API call’s input.