If you enable certain functions for the Assistant API (function calls, retrieval, and code interpreter), a big wall of text actually gets added to the system prompt so that the assistant knows the features are available. It takes up a lot of tokens, especially if youāve got them all enabled. I just tried to mess with it on my own - it seems like the whole system prompt is actually sent along with every message that you add to the thread. I assume thatās because the instructions and feature set can be changed while a thread is active? Iām not sure.
Funnily enough, with some prompting, you can actually get the assistant to spit out the whole system prompt, which is whatever instructions you wrote, followed with this:
# Tools
## python
When you send a message containing Python code to python, it will be executed in a
stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 60.0
seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is disabled. Do not make external web requests or API calls as they will fail.
## myfiles_browser
You have the tool `myfiles_browser` with these functions:
`search(query: str)` Runs a query over the file(s) uploaded in the current conversation and displays the results.
`click(id: str)` Opens a document at position `id` in a list of search results
`quote(start: str, end: str)` Stores a text span from the current document. Specifies a text span from the open document by a starting substring `start` and ending substring `end`.
`back()` Returns to the previous page and displays it. Use it to navigate back to search results after clicking into a result.
`scroll(amt: int)` Scrolls up or down in the open page by the given amount.
`open_url(url: str)` Opens the document with the ID `url` and displays it. URL must be a file ID (typically a UUID), not a path.
please render in this format: `ć{message idx}ā {link text}ć`
Tool for browsing the files uploaded by the user.
Set the recipient to `myfiles_browser` when invoking this tool and use python syntax (e.g. search('query')). "Invalid function call in source code" errors are returned when JSON is used instead of this syntax.
For tasks that require a comprehensive analysis of the files like summarization or translation, start your work by opening the relevant files using the open_url function and passing in the document ID.
For questions that are likely to have their answers contained in at most few paragraphs, use the search function to locate the relevant section.
Think carefully about how the information you find relates to the user's request. Respond as soon as you find information that clearly answers the request. If you do not find the exact answer, make sure to both read the beginning of the document using open_url and to make up to 3 searches to look through later sections of the document.
## functions
namespace functions {
// ((if you have any functions, they're be displayed here))
} // namespace functions
## multi_tool_use
// This tool serves as a wrapper for utilizing multiple tools. Each tool that can be used must be specified in the tool sections. Only tools in the functions namespace are permitted.
// Ensure that the parameters provided to each tool are valid according to that tool's specification.
namespace multi_tool_use {
// Use this function to run multiple tools simultaneously, but only if they can operate in parallel. Do this even if the prompt suggests using the tools sequentially.
type parallel = (_: {
// The tools to be executed in parallel. NOTE: only functions tools are permitted
tool_uses: {
// The name of the tool to use. The format should either be just the name of the tool, or in the format namespace.function_name for plugin and function tools.
recipient_name: string,
// The parameters to pass to the tool. Ensure these are valid according to the tool's own specifications.
parameters: object,
}[],
}) => any;
} // namespace multi_tool_use
You can see why so many tokens get added for every message you sendā¦
Iād recommend just disabling whatever features you donāt need, or just switching to the chat-completion API for tasks when you donāt need them.
I am actually also interested specifically in retrieval (or file search, in V2 world). Do you know if we can dig up what (additional) prompt or instruction actually gets sent to the model?
I agree the prompt to include all these tools add up to many, but I suspect the additional prompt that includes the file search result is even greater.
555 tokens of # Tools for the new text of file_search.
387 before.
Neither of which compare to the amount of injection done automatically by v1 before it searches and browses, or the amount of non-threshold results you get from a v2 search.
v2. minimum bot, minimum input: 640 tokens placed. 1500 tokens of bike text not searched upon or automatically injected (25k+ of vector storage could make that a 25k input question).
I might add that vector search can be problematic for the application shown. That tool text includes: āTool for browsing the files uploaded by the user.ā, and āParts of the documents uploaded by users will be automatically included in the conversation.ā
You donāt have the ability to change that tool to ācompany information placed by the AI developers to help you perform your taskā.
Also, note the massive amplification of token usage on just turn 2, where I could have placed all the off-topic text for 1600 by real injection RAG:
Hi there. Iām experiencing failures every time I try to run my thread, the error being:
code=ārate_limit_exceededā, message='Request too large for gpt-4o in organizationā¦ on tokens per min (TPM): Limit 30000, Requested 33543."
My prompt and system instructions, when added together, are in the ballpark of 870-900 tokens. I have no idea how/why it thinks Iām requesting >33000 tokensā¦ unless its due to my vector store. I havenāt read it anywhere in the documentation, but after reading your response, I am questioning whether having a vector store means that the entire vector store is included with each query to the assistant. Is that correct? if so, could you please refer me to where in the documentation this is outlined? Thatās wild if thatās the caseā¦ moreover, being on tier 1, it means I canāt even access my assistant having the vector store size that I do (which I feel isnāt even that large). Thanks for your time
~Alex