Realtime API Tool calling problems - no response when a Tool is included in the session

@philippeWander - a couple of observations I have made that might help:

Assumption: Python implementation.

  1. Regarding tool calls - as you know, when the websocket connection is opened, OpenAI creates the session and we have to send a session.update. Include the tools in that session.update event.
  2. When the function call happens, you will receive a bunch of response.function_call_arguments.delta events
  3. wait for the response.function_call_arguments.done event to arrive
  4. then you need to send a conversation.item.create event like so:
conversation_item = {
  "type": "conversation.item.create",
  "previous_item_id": None,  # You can set this appropriately
  "item": {
    "id": f"msg_{call_id}",
    "type": "function_call_output",
    "status": "completed",
    "role": "system",
    "call_id": call_id,
    "content": {
      "call_id": call_id,
      "name": call_name,
      "arguments": call_arguments,
      "output": output_message
      }
    }
  }
  1. then the response.done event will contain the same function call_id showing status as completed.

Watch for an error event from OpenAI like so:

Received event from OpenAI: {'type': 'error', 'event_id': 'event_ANaEFz6z4X38KZNpneqvk', 'error': {'type': 'invalid_request_error', 'code': 'missing_required_parameter', 'message': "Missing required parameter: 'item.call_id'.", 'param': 'item.call_id', 'event_id': None}}

If this error is not fixed, then after the function_call is made, the AI will stop speaking and the only way to trigger a response is via “are you there” from the user or something similar.

Additional observations:

  1. I suspect your observation regarding VAD via session_update not working is correct, although I have not been able to conclusively test and confirm this.
  2. I am seeing cancelled response events with “turn_detected” flags which tells me that VAD is working.
  3. Do you have any such outputs you could share? It should look something like this:
Received event from OpenAI: {'type': 'response.done', 'event_id': 'event_ANaErQa1DdjAms2gXkcMN', 'response': {'object': 'realtime.response', 'id': 'resp_ANaErOidD7tYLMIq5pzIH', 'status': 'cancelled', 'status_details': {'type': 'cancelled', 'reason': 'turn_detected'}, 'output': [], 'usage': {'total_tokens': 0, 'input_tokens': 0, 'output_tokens': 0, 'input_token_details': {'cached_tokens': 0, 'text_tokens': 0, 'audio_tokens': 0}, 'output_token_details': {'text_tokens': 0, 'audio_tokens': 0}}}}

Hope this helps mate… !

I was also having issues with the Twilio python demo when adding tools to the session. Here’s how I fixed it.

  1. It’s important to note that when you first connect to the OpenAI websocket, it gives a default session config. This default config uses pcm16 for both the input and output audio formats. This is important because it means that if any subsequent session updates fail, the format will remain pcm16, and therefore the audio payload streamed to Twilio will sound like random static.
  2. In your send_to_twilio function, add something like this:
if response.get("type") == "error":
  print(f"\n\n>>> Received error from OpenAI: {response}\n\n")
  assert False, "Received error from OpenAI"

This will capture and log any errors that we get, and the assert will halt your program so that they are extremely noticeable and easy to find in your logs.

  1. That should get you on the right path to actually figuring out your error. Now, if you are encountering the same situation as me, then it’s likely that your session.update event is failing because of incorrectly formatted tool definitions.

For the chat completions API, I have a tool definition something like this which works correctly:

{
    "type": "function",
    "function": {
        "name": "some_tool_name",
        "description": "some tool description",
        "strict": True,
        "parameters": {
            "type": "object",
            "required": [...],
            "properties": {...},
            "additionalProperties": False,
        },
    },
}

But the real time API expects a slightly different format:

{
    // We have removed the nested "function" key, 
    // and also removed the "strict" param, 
    // as neither are supported in the real time api.
    "type": "function", 
    "name": "some_tool_name",
    "description": "some tool description",
    "parameters": {
        "type": "object",
        "required": [...],
        "properties": {...},
        "additionalProperties": False,
    },
}

With those updated definitions, now when you send your session.update event, it should succeeded — you can confirm this by listening for a session.updated event from the server. You’ll be able to inspect that this event has correctly set your input and output audio formats to g711_ulaw, and Twilio is now able to correctly play your audio.