Realtime API Tool calling problems - no response when a Tool is included in the session

liquidshadowsmk · October 29, 2024, 7:56am

@philippeWander - a couple of observations I have made that might help:

Assumption: Python implementation.

Regarding tool calls - as you know, when the websocket connection is opened, OpenAI creates the session and we have to send a session.update. Include the tools in that session.update event.
When the function call happens, you will receive a bunch of response.function_call_arguments.delta events
wait for the response.function_call_arguments.done event to arrive
then you need to send a conversation.item.create event like so:

conversation_item = {
  "type": "conversation.item.create",
  "previous_item_id": None,  # You can set this appropriately
  "item": {
    "id": f"msg_{call_id}",
    "type": "function_call_output",
    "status": "completed",
    "role": "system",
    "call_id": call_id,
    "content": {
      "call_id": call_id,
      "name": call_name,
      "arguments": call_arguments,
      "output": output_message
      }
    }
  }

then the response.done event will contain the same function call_id showing status as completed.

Watch for an error event from OpenAI like so:

Received event from OpenAI: {'type': 'error', 'event_id': 'event_ANaEFz6z4X38KZNpneqvk', 'error': {'type': 'invalid_request_error', 'code': 'missing_required_parameter', 'message': "Missing required parameter: 'item.call_id'.", 'param': 'item.call_id', 'event_id': None}}

If this error is not fixed, then after the function_call is made, the AI will stop speaking and the only way to trigger a response is via “are you there” from the user or something similar.

Additional observations:

I suspect your observation regarding VAD via session_update not working is correct, although I have not been able to conclusively test and confirm this.
I am seeing cancelled response events with “turn_detected” flags which tells me that VAD is working.
Do you have any such outputs you could share? It should look something like this:

Received event from OpenAI: {'type': 'response.done', 'event_id': 'event_ANaErQa1DdjAms2gXkcMN', 'response': {'object': 'realtime.response', 'id': 'resp_ANaErOidD7tYLMIq5pzIH', 'status': 'cancelled', 'status_details': {'type': 'cancelled', 'reason': 'turn_detected'}, 'output': [], 'usage': {'total_tokens': 0, 'input_tokens': 0, 'output_tokens': 0, 'input_token_details': {'cached_tokens': 0, 'text_tokens': 0, 'audio_tokens': 0}, 'output_token_details': {'text_tokens': 0, 'audio_tokens': 0}}}}

Hope this helps mate… !

marcel.vanworkum · November 13, 2024, 2:01pm

I was also having issues with the Twilio python demo when adding tools to the session. Here’s how I fixed it.

It’s important to note that when you first connect to the OpenAI websocket, it gives a default session config. This default config uses pcm16 for both the input and output audio formats. This is important because it means that if any subsequent session updates fail, the format will remain pcm16, and therefore the audio payload streamed to Twilio will sound like random static.
In your send_to_twilio function, add something like this:

if response.get("type") == "error":
  print(f"\n\n>>> Received error from OpenAI: {response}\n\n")
  assert False, "Received error from OpenAI"

This will capture and log any errors that we get, and the assert will halt your program so that they are extremely noticeable and easy to find in your logs.

That should get you on the right path to actually figuring out your error. Now, if you are encountering the same situation as me, then it’s likely that your session.update event is failing because of incorrectly formatted tool definitions.

For the chat completions API, I have a tool definition something like this which works correctly:

{
    "type": "function",
    "function": {
        "name": "some_tool_name",
        "description": "some tool description",
        "strict": True,
        "parameters": {
            "type": "object",
            "required": [...],
            "properties": {...},
            "additionalProperties": False,
        },
    },
}

But the real time API expects a slightly different format:

{
    // We have removed the nested "function" key, 
    // and also removed the "strict" param, 
    // as neither are supported in the real time api.
    "type": "function", 
    "name": "some_tool_name",
    "description": "some tool description",
    "parameters": {
        "type": "object",
        "required": [...],
        "properties": {...},
        "additionalProperties": False,
    },
}

With those updated definitions, now when you send your session.update event, it should succeeded — you can confirm this by listening for a session.updated event from the server. You’ll be able to inspect that this event has correctly set your input and output audio formats to g711_ulaw, and Twilio is now able to correctly play your audio.

liquidshadowsmk · November 16, 2024, 8:31am

One of the challenges I am facing is slightly different to what’s been reported/discussed here…

specifying function calls in session.update has always worked for me. No issues there. The function is being called too. But… despite me sending a conversation.item.create with the function_call_output params, there’s always a silence when the function is being executed by the Realtime API.

I (the user) has to say “are you there!” or something, and the Realtime API continues the conversation. without that, there’s perpetual silence.

I can confirm that the function call execution is not failing on the server side and the conversation.item.create event is also not returning any invalid errors from OpenAI.

Anyone else experiencing this?

mipmapper · November 19, 2024, 3:58pm

Yes, I have exactly the same problem, function calls successfully, I call conversation.item.create with the response and the call_id, no errors no audio response, unless I say actually say something.

Any ideas how we can solve this…

liquidshadowsmk · November 20, 2024, 3:59am

I have recently solved this with the help of @zia.khan
So after the conversation.item.create, which doesn’t automatically trigger a model response, we have to send a response.create event. The OpenAI documentation on this is wrong and the only acceptable payload (after multiple attempts and burning through tokens for testing this) is like so:

(Python Implementation)

response_item = {
    "type":"response.create",
    "response": {
            "instructions": output_message,
        }
    }

You must fire this event right after sending the conversation.item.create event. Let me know if it works for you.

j0rdan · November 20, 2024, 7:20am

This behavior is addressed in the docs:

Adding a function call output to the conversation does not automatically trigger another model response. You can experiment with the instructions to prompt a response, or you may wish to trigger one immediately using response.create .

https://platform.openai.com/docs/guides/realtime#handling-tool-calls

I don’t think prompting the model to generate a response would work reliably (I may be wrong). Sending response.create after appending the tool call result is the way to go.

Topic		Replies	Views
Function Calling Help - Model Doesn't Seem To Accept Function Prompt? Prompting functions , function-calling	14	3717	February 10, 2024
Function calling looping uncontrollably and calling unnecessarily Bugs function-calling , gpt-4o , gpt-4o-mini	27	732	September 19, 2024
Bad results when using fine-tuned model with function calling API fine-tuning , function-calling , fine-tuning-problems	15	4452	November 23, 2023
Model tries to call unknown function multi_tool_use.parallel Bugs function-calling , assistants-api	50	9875	December 16, 2024
Function calling with fine tuned model API	18	4333	December 1, 2023

Realtime API Tool calling problems - no response when a Tool is included in the session

Related topics