Using Assistant API GPT-4o with File Search enabled automatically ups the tokens used by 3.5k

I’m running tests with the Assistants API via Playground.
Whenever I enable the File Search option, regardless of whether any files are even added to the Assistant or an actual thread, my request is penalized with an additional 3.5k tokens input cost per each query.

Exactly same instructions and content ran with different setup:

  • GPT4o No File Processing: 2982 Tokens total
  • GPT4o File Processing enabled: 6411 Tokens total, where the overhead is all from Input

I would have guessed that I’ll be billed extra for using the vectors if some files are retrieved, but apparently it’s just a hidden cost of the feature?

When enabling tools, like custom functions, code interpreter or file search, a pretty large set of instructions gets added under the hood alongside whatever system prompt you wrote. It’s not very well documented for some reason. By using some clever prompting you can get the Assistant to spit out its entire instructions, including the “hidden” ones, in a message and calculate the tokens.

Enabling these tools can quickly use up a LOT of tokens, even if you’re not actively using them. There’s no real way around it other than only enabling them when you need them.

EDIT: Trying this on a V2 Assistant that has File Search enabled, here’s the added text in the system prompt:

[Your system prompt would be here, anything under this line is added by the assistant.]
Image input capabilities: Enabled

# Tools

## myfiles_browser

You have the tool `myfiles_browser` with these functions:
`msearch(queries: list[str])` Issues multiple queries to a search over the file(s) uploaded in the current conversation and displays the results.
please render in this format: `【{message idx}†{link text}】`

Tool for browsing the files uploaded by the user.

Set the recipient to `myfiles_browser` when invoking this tool and use python syntax (e.g. msearch(['query'])). "Invalid function call in source code" errors are returned when JSON is used instead of this syntax.

Parts of the documents uploaded by users will be automatically included in the conversation. Only use this tool, when the relevant parts don't contain the necessary information to fulfill the user's request.

Think carefully about how the information you find relates to the user's request. Respond as soon as you find information that clearly answers the request.

Issue multiple queries to the msearch command only when the user's question needs to be decomposed to find different facts. In other scenarios, prefer providing a single query. Avoid single word queries that are extremely broad and will return unrelated results.


Here are some examples of how to use the msearch command:
User: What was the GDP of France and Italy in the 1970s? => msearch(["france gdp 1970", "italy gdp 1970"])
User: What does the report say about the GPT4 performance on MMLU? => msearch(["GPT4 MMLU performance"])
User: How can I integrate customer relationship management system with third-party email marketing tools? => msearch(["customer management system marketing integration"])
User: What are the best practices for data security and privacy for our cloud storage services? => msearch(["cloud storage security and privacy"])



Please provide citations for your answers and render them in the following format: `【{message idx}:{search idx}†{link text}】`.

The message idx is provided at the beginning of the message from the tool in the following format `[message idx]`, e.g. [3].
The search index should be extracted from the search results, e.g. # 【13†Paris†4f4915f6-2a0b-4eb5-85d1-352e00c125bb】refers to the 13th search result, which comes from a document titled "Paris" with ID 4f4915f6-2a0b-4eb5-85d1-352e00c125bb.
For this example, a valid citation would be ` `.

All 3 parts of the citation are REQUIRED.

Thanks for the response turbolucius!
I did expect some overhead coming from the files browser but never in my wildest dreams would’ve I thought it’s gonna consume 3.5k just as standby :smiley:
The prompt you provided equals to about 600 tokens, which means the assistant has to be invoking loads of functions in the background, even with empty vectors, which I still find pretty odd.
Edited my code to only enable the stuff when necessary for now. Will probably revert back to using data OCR-preprocessing as it’s still cheaper thank pasting the file to vector store on openai side

1 Like