Has anyone managed to get a tool_call working when stream=True?

Hi there! I’m new here so please forgive any poor choices etc. I have been playing around with the OpenAI API for a few months now, and this is how i previously handled function calls and streaming in python:

if chunk["choices"][0]["delta"].get("function_call"):                            
    if "name" in chunk["choices"][0]["delta"]["function_call"]:
        function_name = chunk["choices"][0]["delta"]["function_call"]["name"]
        chunk = chunk["choices"][0]["delta"]
        function_arguments_chunk = chunk["function_call"]["arguments"]
        function_arguments += function_arguments_chunk
        print(function_arguments_chunk, end='', flush=True)
        function_called = True

However, since function calls are now deprecated, I was wondering if anyone had a solution to get something like this working with the new GPT-4-1106-preview model with streaming and handling multiple tool calls?

I have deduced that a tool call is now handled as a finish_reason, however I am unsure if this is still the case while streaming a response.

I’ll have to do some more digging, but any help is appreciated!

Many thanks
:smiley:

2 Likes

I was just looking into this myself and your post popped up. According to the OpenAI OpenAPI spec, tool call chunks have an index property, which should be present on each returned chunk. This should allow for demarcation of array elements, one per tool call. Haven’t tried it yet, but hope this helps.

1 Like

I took a look at this - thanks! However this still seems to be using the old api, as functions are still mentioned. This is what my chunk.choice looks like (with stream=True):

[Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0)]

Even when running with curl (without streaming)

curl https://api.openai.com/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY"
-d '{
  "model": "gpt-4-1106-preview",
  "messages": [
    {
      "role": "user",
      "content": "What is the weather like in Boston?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
              "type": "object",
              "properties": {
                  "location": {
                      "type": "string",
                      "description": "The city and state, e.g. San Francisco, CA",
                  },
                  "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
              },
              "required": ["location"],
          },
      },
    }
  ],
  "stream": false
}'
OUTPUT:
{"id":"chatcmpl-8KSJXuiGM4kKM8RyWHfZ9HJrnc2uW","object":"chat.completion","created":1699886091,"model":"gpt-4-1106-preview","choices":[{"index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"id":"call_3JPawcsAOmu6Kq8jtpELuFcu","type":"function","functio...

And when running curl with streaming, I get no response:

curl https://api.openai.com/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY"
 -d '{
  "model": "gpt-4-1106-preview",
  "messages": [
    {
      "role": "user",
      "content": "What is the weather like in Boston?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
              "type": "object",
              "properties": {
                  "location": {
                      "type": "string",
                      "description": "The city and state, e.g. San Francisco, CA",
                  },
                  "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
              },
              "required": ["location"],
          },
      },
    }
  ],
  "stream": true
}'
gpu@gpu-server:~$ [NO RESPONSE]

So it could be that tool_calls aren’t supported yet with streaming.

Unless someone can try the curl with streaming on and post their output to see if it’s an issue on my end, I guess I’ll just wait and see what happens.

1 Like

@Xeniox : I also performed some experiments and agree with your assessment. Posted a bug report here for tracking by OpenAI.

1 Like

Are you still having this problem ? I may have a workaround

2 Likes
1 Like

@Cristhiandcl8 : If you have a workaround, please post it. The original poster had no luck with curl and I had no luck with the new Python SDK, built from the OpenAPI spec with Stainless. With curl, I thought there could be a problem with buffering the server-sent events (possibly mitigated by --no-buffer), but that seems to not be the case.

1 Like

I saw that documentation on the Assistants API limitation as part of the release notes. We’re talking about the standard Chat Completions API here. While I believe that it is quite possible that the Assistants API is built on top of the Chat Completions API, I don’t think we can infer that Chat Completions does not support streaming of tool calls from that.

3 Likes

I tested streaming as response from the other post yesterday. Just follow the raw output on what to expect. It may help guide you to adjust your code.

1 Like

I got tool calls working on streaming calls to chatcompletion last weekend. It was a pain, as it’s not documented, and it’s unclear what’s happening when you dump the chunks to text. Plus there may be multiple tool/function calls in a single response now. It’s doable, though, and it does work. The relevant snippet is this:

tool_calls = []

# build up the response structs from the streamed response, simultaneously sending message chunks to the browser
for chunk in response:
    delta = chunk.choices[0].delta
    #app.logger.info(f"chunk: {delta}")

    if delta and delta.content:
        # content chunk -- send to browser and record for later saving
        socket.send(json.dumps({'type': 'message response', 'text': delta.content }))
        newsessionrecord["content"] += delta.content

    elif delta and delta.tool_calls:
        tcchunklist = delta.tool_calls
        for tcchunk in tcchunklist:
            if len(tool_calls) <= tcchunk.index:
                tool_calls.append({"id": "", "type": "function", "function": { "name": "", "arguments": "" } })
            tc = tool_calls[tcchunk.index]

            if tcchunk.id:
                tc["id"] += tcchunk.id
            if tcchunk.function.name:
                tc["function"]["name"] += tcchunk.function.name
            if tcchunk.function.arguments:
                tc["function"]["arguments"] += tcchunk.function.arguments
9 Likes

Discovered what was happening in my case (OpenAI Python SDK 1.3). With the previous streaming implementation for function or content, you were always able to determine the type of response in the first chunk. With tool calls, the first chunk actually can have content, function, and tool_calls all set to None and so you have to sniff multiple chunks from the response before you can determine what kind of response you are accumulating.

There is additional caveat that a tool call chunk delta always presents an array of length 1 which contains an object that has the index inside of it. This was non-obvious from looking at the OpenAPI spec. Here’s an example of such a delta:

         delta: ChoiceDelta(
                    content=None,
                    function_call=None,
                    role=None,
                    tool_calls=[
                        ChoiceDeltaToolCall(
                            index=0,
                            id='call_uGViZDuQa8pAApH3NnMC9TX9',
                            function=ChoiceDeltaToolCallFunction(arguments='', name='read'),
                            type='function'
                        )
                    ]
                )

Here’s my implementation with the new Python SDK (handling the legacy function calls really should be separate logic, but…):

    from collections import defaultdict
    tool_calls = [ ]
    index = 0
    start = True
    for chunk in response:
        delta = chunk.choices[ 0 ].delta
        if not delta: break
        if not delta.function_call and not delta.tool_calls:
            if start: continue
            else: break
        start = False
        if delta.function_call:
            if index == len( tool_calls ):
                tool_calls.append( defaultdict( str ) )
            if delta.function_call.name:
                tool_calls[ index ][ 'name' ] = delta.function_call.name
            if delta.function_call.arguments:
                tool_calls[ index ][ 'arguments' ] += (
                    delta.function_call.arguments )
        elif delta.tool_calls:
            tool_call = delta.tool_calls[ 0 ]
            index = tool_call.index
            if index == len( tool_calls ):
                tool_calls.append( defaultdict( str ) )
            if tool_call.id:
                tool_calls[ index ][ 'id' ] = tool_call.id
            if tool_call.function:
                if tool_call.function.name:
                    tool_calls[ index ][ 'name' ] = tool_call.function.name
                if tool_call.function.arguments:
                    tool_calls[ index ][ 'arguments' ] += (
                        tool_call.function.arguments )

Hope this helps.

2 Likes

This is how I did it

recovered_pieces = {
                         "content": None,
                         "role": "assistant",
                         "tool_calls": {}
                       }

for chunk in response:
        delta = chunk.choices[0].delta
        if delta.content is None:
            if delta.tool_calls:
                piece = delta.tool_calls[0]
                recovered_pieces["tool_calls"][piece.index] = recovered_pieces["tool_calls"].get(piece.index, {"id": None,  "function": {"arguments":"",  "name": ""},  "type": "function"})
                if piece.id:
                    recovered_pieces["tool_calls"][piece.index]["id"] = piece.id
                if piece.function.name:
                    recovered_pieces["tool_calls"][piece.index]["function"]["name"] = piece.function.name
                recovered_pieces["tool_calls"][piece.index]["function"]["arguments"] += piece.function.arguments   
            
        else:
            yield delta.content
3 Likes

This totally worked! I had to convert tool_calls to an array to pass it back to chatgpt

recovered_pieces[‘tool_calls’] = [recovered_pieces[‘tool_calls’][key] for key in recovered_pieces[‘tool_calls’]]
messages.append(recovered_pieces)

1 Like

@Xeniox

It is OK if a moderator closes this topic?

Do you think this code can be adapted for the version with several tools that have dependencies between them?

For example, function B uses the results of function A as arguments.

I specified in the system message and in the function description that function B depends on function A, but the gpt-3.5-turbo-1106 version with chat completions does not take this into account.

It is interesting that when I used an assistant it took the dependency into account. But streaming for assistants is not available.

I use NextJS and it’s impossible in my opinion to actually get streaming working with tool_call. They should just fix the ‘stream: true’ flag which is also not working in the Assistants API. I really want to go forward and integrate all the new stuff, but I just can’t release any substantial update without a functioning streaming option.

Check this from vercel/ai, the official library when building AI apps in vercel/next.

Here is an example for stream=true and with tools.

API route handler:

const tools: Tool[] = [
  {
    type: 'function',
    function: {
      name: 'get_current_weather',
      description: 'Get the current weather',
      parameters: {
        type: 'object',
        properties: {
          location: {
            type: 'string',
            description: 'The city and state, e.g. San Francisco, CA',
          },
          format: {
            type: 'string',
            enum: ['celsius', 'fahrenheit'],
            description:
              'The temperature unit to use. Infer this from the users location.',
          },
        },
        required: ['location', 'format'],
      },
    },
  },
  {
    type: 'function',
    function: {
      name: 'eval_code_in_browser',
      description: 'Execute javascript code in the browser with eval().',
      parameters: {
        type: 'object',
        properties: {
          code: {
            type: 'string',
            description: `Javascript code that will be directly executed via eval(). Do not use backticks in your response.
           DO NOT include any newlines in your response, and be sure to provide only valid JSON when providing the arguments object.
           The output of the eval() will be returned directly by the function.`,
          },
        },
        required: ['code'],
      },
    },
  },
];

export async function POST(req: Request) {
  const { messages } = await req.json();

  const model = 'gpt-3.5-turbo-0613';

  const response = await openai.chat.completions.create({
    model,
    stream: true,
    messages,
    tools,
    tool_choice: 'auto',
  });

  const data = new experimental_StreamData();
  const stream = OpenAIStream(response, {
    experimental_onToolCall: async (
      call: ToolCallPayload,
      appendToolCallMessage,
    ) => {
      for (const toolCall of call.tools) {
        // Note: this is a very simple example of a tool call handler
        // that only supports a single tool call function.
        if (toolCall.func.name === 'get_current_weather') {
          // Call a weather API here
          const weatherData = {
            temperature: 20,
            unit: toolCall.func.arguments.format === 'celsius' ? 'C' : 'F',
          };

          const newMessages = appendToolCallMessage({
            tool_call_id: toolCall.id,
            function_name: 'get_current_weather',
            tool_call_result: weatherData,
          });

          return openai.chat.completions.create({
            messages: [...messages, ...newMessages],
            model,
            stream: true,
            tools,
            tool_choice: 'auto',
          });
        }
      }
    },
    onCompletion(completion) {
      console.log('completion', completion);
    },
    onFinal(completion) {
      data.close();
    },
    experimental_streamData: true,
  });

  data.append({
    text: 'Hello, how are you?',
  });

  return new StreamingTextResponse(stream, {}, data);
}

Client-side:

import { ChatRequest, ToolCallHandler, nanoid } from 'ai';
import { Message, useChat } from 'ai/react';

...

const toolCallHandler: ToolCallHandler = async (chatMessages, toolCalls) => {
    let handledFunction = false;
    for (const tool of toolCalls) {
      if (tool.type === 'function') {
        const { name, arguments: args } = tool.function;

        if (name === 'eval_code_in_browser') {
          // Parsing here does not always work since it seems that some characters in generated code aren't escaped properly.
          const parsedFunctionCallArguments: { code: string } =
            JSON.parse(args);

          // WARNING: Do NOT do this in real-world applications!
          eval(parsedFunctionCallArguments.code);

          const result = parsedFunctionCallArguments.code;

          if (result) {
            handledFunction = true;

            chatMessages.push({
              id: nanoid(),
              tool_call_id: tool.id,
              name: tool.function.name,
              role: 'tool' as const,
              content: result,
            });
          }
        }
      }
    }

    if (handledFunction) {
      const toolResponse: ChatRequest = { messages: chatMessages };
      return toolResponse;
    }
  };

const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: '/api/chat-with-tools',
    experimental_onToolCall: toolCallHandler,
  });
...

Thanks for this @supershaneski! Do you know if this pattern creates a loop for the LLM to call the tool multiple times or just supports 1 call?

Any solution for this error: An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_7iGLlP788Y3p6A1fPP27Vy0L

That indicates that you did not pass the required pairing of prior assistant output and the tool response properly. They both must be appended consecutively after the most recent user input with the matching tool IDs.

Here’s a linear code write-up, where code block 3 and 4 show the construction of the assistant tool_call and the tool response to be placed.