Function calling responds in parallel but differently in tool call and in message thread

when i respond to a function call by submitting tool outputs, the model returns two different responses:

  • one in the tool call response in the function call on the run
  • one in the message list from the thread

these two responses are different, and even attempting to use one vs. the other moving forward is inconsistent since sometimes each answer has different details that are useful to the continuing conversation. i am unsure whether to respond by submitting tool outputs or responding to the message thread

if i respond to the message thread, should i kill the run? if so, it feels like i’m missing something obvious about the parallel processes of run lifecycles vs. the continuing thread. i can apparently have a back-and-forth conversation by just submitting tool outputs while similar but diverging messages from the model just appear on the thread even though i’m not responding directly to the thread

Not sure I got exactly what you’re saying but the tool_call response would just be an acknowledgement your functions call repsonses are accepted. In the message list I would expect a regular GPT ‘response’ based on the way it has processed the function calls. Is that not what you mean?

hey @jlvanhulst i appreciate you responding. perhaps i am misunderstanding the intended use case of function calling. my understanding is that it is used to ensure the assistants api returns the properly formatted response for use in external functions.

ideally, standard messages in the thread would simply have this format assurance, and then the parallel function calling would not be needed, but as it is both operate simultaneously. the problem occurs when the message in the thread contains a better response than the tool calls output, but the message is not in a format predictably parsable into the needed structure, so only the inferior response is usable.

sometimes the message in the thread has additional information that is referred to by subsequent responses, but if it’s not usable, the needed context is missing in external systems that use the output.

for example, one use case i have is creating a dynamic story with the model creating a narrative, asking the user questions, and continuing the story based on the user’s response. i need the response in a specific, predictable format for use elsewhere, but sometimes the message has better narrative content than the tool calls output.

sometimes the message even includes additional information that is not in the tool calls output, but is referenced by the next response from the model, which is really confusing for the user. so if we’re at point A in our story, we have two outputs: A1 and A2 which have different amounts of narrative information. if i choose output A2 from the tool calls output, when the user gets to point B in the story, the model references information in A1 that was not in A2.

at this point the assistants api is not usable for my use case so i am using standard chat completion and just uploading all the context in each call. for assistants threads to be usable, either the divergent issue would need to be solved, or json format added to the message output.

Perhaps so.

A tool (only functions are allowed) is an external access method to a function of your code that the AI may employ.

A function might be something that informs the AI, like a calculator, a database, or a web search.

A function might be something like an action or trigger, like posting a tweet, or even hanging up on the user.

What it is not, especially within assistants, is a way to get a response in a particular format.

For that, you would just specify to the AI the output that you want written and the purpose. You can be quite robust in your description, such as providing a full json schema with descriptions and examples as the response specification (such as could also validate the AI output.)

how is a function to be properly called with the correct inputs if the response is not in a particular format?

currently the json_object response_format is not available in the assistants api. it is only available in the chat completions api.

the standard message response is unpredictable in its response format without that feature. even 10% of responses being incorrectly formatted makes a product unusable for users.

The tool call mechanism of gpt-4-turbo models uses the response_format json technique. Which actually can be problematic.

The output of functions is reliable because you create a specification in a particular format, and the AI has extensive fine-tune training on how to write and send functions.

And one can control any unpredictability otherwise when not using assistants but rather chat completions where you can control the sampling parameters. If you said “assistants is not the best way to do X”, for the largest subset of tasks, you’d be right.