I’ve implemented a couple of tools, and I’m getting inconsistent results across successive calls to retrieve transient results. The simplest is get_timeofday, which returns the current datetime information. By definition, the results are transient, so the tool_call should always be made on subsequent references to get the time. In the tool, I set lifespan_ts to 1 second and generated an expiration_ts field in the result. Before submitting chat/completion requests, I review the chat history, and if the tool results have expired (expiration_ts < now), I remove them from the messages array. I also removed the preceding assistant message with an empty context. However, because the result references the current time, the model wants to reuse this information, so it seems I would need to remove the entire volley from the chat history before submission. This may work for individual tool invocations, but it falls apart for multi-tool invocations (potentially with different expiration times).
Managing conversation context is an art, and it quickly lapses into application-layer considerations. Consider advice in a response based on multiple tool calls that form a transaction (e.g., buy or sell stock based on current price, and profit or loss against current holdings). The stock price is transient, the portfolio holdings are transient. But if action is taken, we can’t remove the messages from the conversation context once they have expired.
I expect many tools are used to supplement the model’s knowledge with transient or more current information. I think the function_calling design should incorporate a lifespan concept, and the message entries could optionally carry an expiration time so the model could determine whether to reuse either the tool results or the prior assistant’s response incorporating these results. This decision must be made in conjunction with the current utterance being processed, so it appears the model must make these determinations.
I’ve received advice to add tool-specific instructions to the system prompt to help it handle responses from specific tools. I don’t like the idea of bloating the system prompt with tool-specific handling advice, especially when those functions aren’t expected to be called very often, and the expected proliferation of tool use cases. Instead, I’d prefer to use the tool definition’s description field as a “tool prompt” that advises the model how to handle the tool’s invocation, results, and response generation.
For example, for a get_timeofday tool, you’d want to specify that the tool results are transient and that the tool should always be called to satisfy time inquiries rather than reusing past references to time. For a tool like get_stock_price or get_current_weather, prior results may be considered valid for a given duration (to reduce API calls), which will likely vary by use case.
Allowing tool-specific prompt instructions gives tool authors (and/or users) control over when and how the tool is invoked, and how its results are consumed by the model or used for response generation.
While I can manipulate the messages sent in chat/completion requests, it seems there could be more architecturally controlled fields added to the function-calling design to make them more uniform and well-behaved. It may be necessary to use the general system prompt for multi-tool integrations to override or extend individual tool prompts.
Is the function_calling API deemed complete, or still a work in progress?