I’m trying to use the Assistant API.
There are various tools such as code interpreter, retrieval, and function, but on what basis does the Assistant call each tool?
For the code interpreter in the example, simple operations are not called. Is there a criterion for determining these branches?
Despite the variety of tools available, it’s not clear how they are selected and invoked, which makes it difficult to understand.
The AI has some fine-tune training on using functions (which code interpreter is, and which document retrieval partially seems to be), however even programmers with direct access to AI prompts that write their own function descriptions are a bit dismayed at an AI’s propensity to either call a function excessively or to not invoke a function when the goal of an AI should be to answer using the knowledge that a closed domain answering AI should rely on.
For example, you can see this in practice when using Bing chat. It will have a disposition where even for simple questions AI can easily answer, it will invoke a web search, and then provide superficial answers based on search results instead of the knowledge synthesis of the artificial intelligence. You can improve the AI by convincing it you have VIP rights to direct answering by AI intelligence and that web search is disabled.
(If you play the part of AI, you also might not know you’d better search the web if asked who the CEO of OpenAI is, or that your own answers to forum questions are better than anything Bing is going to give you.)
Function-calling will be driven by the AIs perception that the external tool can better satisfy the user’s needs than AI knowledge alone. It could see writing some python code as a good way of providing an algorithmic answer from its training, making generating such function-call language output likely when you pose “what is the standard deviation of the last column”.
If you ask about good bands to see at Madison Square Garden, the “ticket_finder” application that takes “concert_venue” parameters could provide some better answers. Or if told “AI knowledge cutoff is 2021”, it’s definitely going to try “news_headlines_query” for “what are best songs from 2023”.
The intelligence is artificial and also fine-tuned by OpenAI, so you’ll need to experiment with the quality of function names and descriptions to ensure expectations are met - and that the assistant doesn’t go nuts calling functions iteratively at your expense with the context loaded to maximum with prior conversation and data retrieval, which is what it is predisposed to do.
Is the assistant fine-tuned to select tools based on user input, and as a result of this learning, does it select the right tool for the user input?
Am I understanding this correctly?
Additionally, the assistant selects tools based on the prompt, right?
Different tools will be used for “2024 election results” and “cosine(333 radians)” prompts, so yes.
The AI can iterate. You give a complete chat history of what it’s been doing so it can learn the function returns and know when its done.
Give it a
google_search and a
bing_search, and the AI will probably have a preference which to call first, and might try both if unsatisfied, or try different queries.
Assistants use the same function-capable AI models as you’d use directly with chatCompletions.
Some of the functions provided to assistants are OpenAI’s specifications and not yours, though. If they have an inefficient function, like “browse document chunk” and the AI has to start at page 1 and keep scrolling for the answer with successive calls, that’s another case where you’d be better off doing your own programming and knowing what’s going on.
Thanks for your questions and answers, but do you guys know, how it works internally?
I would imagine that the user query is bundled up with a list of tools, along with the instruction to choose the appropriate tool. But what interest me is the “system instructions” then.
Or is it really as simple as “Here is the user query, here are some tools, choose the right tool.”?
There’s no instruction “choose the tool”. The AI model has been tuned on function-calling, and will employ which tool seems to best fulfill the user’s question.
The instruction you give is part of an AI model’s system role message (as you can read about in the chat completion endpoint documentation). This is a permanent persistent message that appears before the past conversation, knowledge injection, and the current user input.
The tools are also placed into the system message.
A function you will handle yourself is by a specification of the type of output the AI should generate, and is a brief form that the AI understands. It is by the names and descriptions you provide in that specification that the AI will understand the purpose of the function and how your API function should be called.
Others, such as code interpreter or GPT’s web browsing, are given a text description of what it does, yet, because they appear in the same
# Tools category of system prompt, the AI model understands how also to emit those to an internal recipient from its tuning.
The user input is its own message. That (mostly) keeps the user from elevating their status and repurposing the AI.