Realtime API Tool calling problems - no response when a Tool is included in the session

@philippeWander - a couple of observations I have made that might help:

Assumption: Python implementation.

  1. Regarding tool calls - as you know, when the websocket connection is opened, OpenAI creates the session and we have to send a session.update. Include the tools in that session.update event.
  2. When the function call happens, you will receive a bunch of response.function_call_arguments.delta events
  3. wait for the response.function_call_arguments.done event to arrive
  4. then you need to send a conversation.item.create event like so:
conversation_item = {
  "type": "conversation.item.create",
  "previous_item_id": None,  # You can set this appropriately
  "item": {
    "id": f"msg_{call_id}",
    "type": "function_call_output",
    "status": "completed",
    "role": "system",
    "call_id": call_id,
    "content": {
      "call_id": call_id,
      "name": call_name,
      "arguments": call_arguments,
      "output": output_message
      }
    }
  }
  1. then the response.done event will contain the same function call_id showing status as completed.

Watch for an error event from OpenAI like so:

Received event from OpenAI: {'type': 'error', 'event_id': 'event_ANaEFz6z4X38KZNpneqvk', 'error': {'type': 'invalid_request_error', 'code': 'missing_required_parameter', 'message': "Missing required parameter: 'item.call_id'.", 'param': 'item.call_id', 'event_id': None}}

If this error is not fixed, then after the function_call is made, the AI will stop speaking and the only way to trigger a response is via ā€œare you thereā€ from the user or something similar.

Additional observations:

  1. I suspect your observation regarding VAD via session_update not working is correct, although I have not been able to conclusively test and confirm this.
  2. I am seeing cancelled response events with ā€œturn_detectedā€ flags which tells me that VAD is working.
  3. Do you have any such outputs you could share? It should look something like this:
Received event from OpenAI: {'type': 'response.done', 'event_id': 'event_ANaErQa1DdjAms2gXkcMN', 'response': {'object': 'realtime.response', 'id': 'resp_ANaErOidD7tYLMIq5pzIH', 'status': 'cancelled', 'status_details': {'type': 'cancelled', 'reason': 'turn_detected'}, 'output': [], 'usage': {'total_tokens': 0, 'input_tokens': 0, 'output_tokens': 0, 'input_token_details': {'cached_tokens': 0, 'text_tokens': 0, 'audio_tokens': 0}, 'output_token_details': {'text_tokens': 0, 'audio_tokens': 0}}}}

Hope this helps mateā€¦ !

I was also having issues with the Twilio python demo when adding tools to the session. Hereā€™s how I fixed it.

  1. Itā€™s important to note that when you first connect to the OpenAI websocket, it gives a default session config. This default config uses pcm16 for both the input and output audio formats. This is important because it means that if any subsequent session updates fail, the format will remain pcm16, and therefore the audio payload streamed to Twilio will sound like random static.
  2. In your send_to_twilio function, add something like this:
if response.get("type") == "error":
  print(f"\n\n>>> Received error from OpenAI: {response}\n\n")
  assert False, "Received error from OpenAI"

This will capture and log any errors that we get, and the assert will halt your program so that they are extremely noticeable and easy to find in your logs.

  1. That should get you on the right path to actually figuring out your error. Now, if you are encountering the same situation as me, then itā€™s likely that your session.update event is failing because of incorrectly formatted tool definitions.

For the chat completions API, I have a tool definition something like this which works correctly:

{
    "type": "function",
    "function": {
        "name": "some_tool_name",
        "description": "some tool description",
        "strict": True,
        "parameters": {
            "type": "object",
            "required": [...],
            "properties": {...},
            "additionalProperties": False,
        },
    },
}

But the real time API expects a slightly different format:

{
    // We have removed the nested "function" key, 
    // and also removed the "strict" param, 
    // as neither are supported in the real time api.
    "type": "function", 
    "name": "some_tool_name",
    "description": "some tool description",
    "parameters": {
        "type": "object",
        "required": [...],
        "properties": {...},
        "additionalProperties": False,
    },
}

With those updated definitions, now when you send your session.update event, it should succeeded ā€” you can confirm this by listening for a session.updated event from the server. Youā€™ll be able to inspect that this event has correctly set your input and output audio formats to g711_ulaw, and Twilio is now able to correctly play your audio.

One of the challenges I am facing is slightly different to whatā€™s been reported/discussed hereā€¦

specifying function calls in session.update has always worked for me. No issues there. The function is being called too. Butā€¦ despite me sending a conversation.item.create with the function_call_output params, thereā€™s always a silence when the function is being executed by the Realtime API.

I (the user) has to say ā€œare you there!ā€ or something, and the Realtime API continues the conversation. without that, thereā€™s perpetual silence.

I can confirm that the function call execution is not failing on the server side and the conversation.item.create event is also not returning any invalid errors from OpenAI.

Anyone else experiencing this?

Yes, I have exactly the same problem, function calls successfully, I call conversation.item.create with the response and the call_id, no errors no audio response, unless I say actually say something.

Any ideas how we can solve thisā€¦

I have recently solved this with the help of @zia.khan
So after the conversation.item.create, which doesnā€™t automatically trigger a model response, we have to send a response.create event. The OpenAI documentation on this is wrong and the only acceptable payload (after multiple attempts and burning through tokens for testing this) is like so:

(Python Implementation)

response_item = {
    "type":"response.create",
    "response": {
            "instructions": output_message,
        }
    }

You must fire this event right after sending the conversation.item.create event. Let me know if it works for you.

1 Like

This behavior is addressed in the docs:

Adding a function call output to the conversation does not automatically trigger another model response. You can experiment with the instructions to prompt a response, or you may wish to trigger one immediately using response.create .

https://platform.openai.com/docs/guides/realtime#handling-tool-calls

I donā€™t think prompting the model to generate a response would work reliably (I may be wrong). Sending response.create after appending the tool call result is the way to go.