Keeping Tool Calls Within Context Window

Hello everyone, as far as I’m aware, one can only feed the content from the assistants reply back into the API, there is no category to store the functions that the assistant called, so when you feed the context back into the API, it “forgets” that it has called functions, because you give it only {role: assistant, content: (…)} and you can’t put in {role: assistant, content: (…), tool_call: (…)}.

(How) can I keep the functions the API calls in the context window. Is there an intended way to do it?

Thanks in advance!

I have been thinking about if ever since function calling was added but never actually tried it. But have you tried persisting the tool calls and its outputs remain in the context? Like…

messages = context
messages.push(query)

1st API call

// invokes tool call
messages.push(tool call response)
messages.push(tool call output)

2nd API call

context = messages
context.push(2nd response)

The next time you call the API, context contains everything. I have not tried doing this way though.

I copied the tool call response I got from the API into the assistants content. Like…

messages = context
response = #some method calling the API
messages.append("role": "assisstant", "content": f"{tool_call(response)}") #extracting tool_call information I received from the API and appending it to the context as if the LLM had put it into the content category.

However, that confuses the LLM and leads it to try to call tools with the same format that I put it into the context window, however the output format that I put into the context window is different from the way the LLM needs to format its output for the Assisstant implementation of OpenAI to know what is a tool call and what is normal content and thus it defaults back to content. I hope that makes sense.

I tested another way that seems to work without problem.

By formatting the content in some JSON format.

message.push({role: ‘user’, content: JSON.stringify({ text: “lorem ipsum…”})})

Then when you receive response from the API, you just put it in the same format.

message.push({role: ‘assistant’, content: JSON.stringify({ text: “lorem ipsum…”, tool: “…” })})

user: {"text":"hello"}
assistant: {"text":"I see you're here! How can I assist you today?","tool":""}
user: {"text":"what is the weather in rome?"}
assistant: {"text":"The current weather in Rome is cloudy with a temperature of 17°C.","tool":"get_current_weather"}
user: {"text":"thanks. how about the traffic?"}
assistant: {"text":"The current traffic situation in Rome is showing heavy congestion. Please plan your travel accordingly.","tool":"get_current_traffic"}

The disadvantage here is token count. But you can just send plain text content by stripping it of the other fields before sending.

context = context.map((item) => {
   const content = JSON.parse(content)
   return {
       ...item,
      content: content.text
  }
})

I implemented something similar yesterday, that seemed to work quite well, at least with 3.5 turbo. Haven’t tested 4 turbo yet. What seemed to help was to insert the tool call information into the assistants content response in a way that emulates the LLM planning out the tool call.

response = {role: assisstant, content: None, tool_name: name, tool_params: params}
message.push({role: assisstant, content: f"I'll no call {response.tool_name} with {response.tool_params}.)"
1 Like