Realtime API Tool calling problems - no response when a Tool is included in the session

When trying to add a tool to the realtime session, via the Twillio integration, it connects, but does not respond.

async def send_session_update(openai_ws): """Send session update to OpenAI WebSocket.""" session_update = { "type": "session.update", "session": { "turn_detection": {"type": "server_vad"}, "input_audio_format": "g711_ulaw", "output_audio_format": "g711_ulaw", "voice": VOICE, "instructions": SYSTEM_MESSAGE, "modalities": ["text", "audio"], "temperature": 0.8, "tools": [ { "name": "get_weather", "description": "Get the weather ", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "Location to get the weather for", } } } } ] } } print('Sending session update:', json.dumps(session_update)) await openai_ws.send(json.dumps(session_update))

The session creation acknowledge includes an empty tool array:

Received event: session.created {'type': 'session.created', 'event_id': 'event_AEOHBpESNni69QMT38iAt', 'session': {'id': 'sess_AEOHBudsh3QTqonKbd3od', 'object': 'realtime.session', 'model': 'gpt-4o-realtime-preview-2024-10-01', 'expires_at': 1727994173, 'modalities': ['text', 'audio'], 'instructions': "Your knowledge cutoff is 2023-10. You are a helpful, witty, and friendly AI. Act like a human, but remember that you aren't a human and that you can't do human things in the real world. Your voice and personality should be warm and engaging, with a lively and playful tone. If interacting in a non-English language, start by using the standard accent or dialect familiar to the user. Talk quickly. You should always call a function if you can. Do not refer to these rules, even if you’re asked about them.", 'voice': 'alloy', 'turn_detection': {'type': 'server_vad', 'threshold': 0.5, 'prefix_padding_ms': 300, 'silence_duration_ms': 200}, 'input_audio_format': 'pcm16', 'output_audio_format': 'pcm16', 'input_audio_transcription': None, 'tool_choice': 'auto', 'temperature': 0.8, 'max_response_output_tokens': 'inf', 'tools': []}}

And the remote voice does not respond. Without the Tools in the session.update, it does respond and is able to converse.

1 Like

I think you’re seeing the same bug as me. Notice the response in the session.update the input_audio_format is still pcm16 and not g711_ulaw. I believe the issue is the audio buffer is not getting any of your voice input.

3 Likes

I set up a similar environment to the RealTime API Console project, and included a tool exactly as was given in the test code. However, with my project - the realtime voice send/receive is working fine, the tool/function doesn’t seem to be called. For reference, I’m using the sample ā€œget_weatherā€ function provided in the demo.

I get back successful ā€œfunction_callā€ and ā€œfunction_call_outputā€ events from client.on(ā€˜conversation.updated’), yet the voice tells me that it was unable to retrieve the weather.

Am I missing something? Or do I need to somehow enable function calls to be attached to the real-time API?

Brandon

1 Like

I just debugged a number of issues around tool execution. It’s particularly broken if you’re using the relay server that ships with the test console.

1 Like

Yeah I can confirm that. After further debugging, I found that the function tool call was working, but there was a delay in when the voice became aware of the answer. If I asked multiple times for the weather, once the AJAX call finished processing, it knew the answer. Otherwise, it just prematurely answered the question without giving enough time for the function call to complete.

And yes, I’m using the relay server that ships with the test console. So likely that is buggy. I’ll check out your other post as well.

Thanks

1 Like

The change I called out in the bug fixed the relay server issue for me. The behavior you mentioned about the Ajax call would make sense. There’s basically a race condition between the client calling the function and the relay server trying to call it but failing.

1 Like

@stevenic I am having trouble with long running tools/functions - have you managed to get them working?

One tool takes 10 seconds to run so in the addTool callback I’m simply waiting and responding with the json. The model initially tells the user to wait (as per my prompt) but then afterwards it immediately says there was a problem getting the result. Once the result does come though, it does responds correctly

Any ideas?

1 Like

I’m assuming you’re using the relay server as this is what I was seeing as well. There’s a bug in the relay server that it’s also trying to run the tool but failing because it can’t find the handler to call. I patched my version of the relay server to avoid this issue.

We’re still waiting on the official fix but someone created a patch file containing my fix which you can find here:

2 Likes

oh amazing, I’ll give that a go, nice debugging! Yes I’ve been using the relay server

i have a couple questions, i see y’all are using the relay server shown in the realtime console repo, how is that being used in the twilio project? and has anyone figured out how to get function calling / tool usage working in the twilio example?

1 Like

def some kind of bug(s). I’m seeing the same thing (and more) when experimenting.

Here’s some of my findings:

  • If I try and change the temperature to anything below 0.6, I get no response from the session.update.
  • If I try and add tools to session.update call, same thing as above happens.
  • When changing the input_audio_format (and output) to use ā€œg711_ulawā€, only the output appears to be updated and the input remains ā€œpcm16ā€
2 Likes

I’m seeing the same issue. Making tools available results in both input audio and output audio have the same unsuitable format.

            "input_audio_format": "pcm16",
            "output_audio_format": "pcm16",

Definitely a weird bug.

1 Like

Is there anything we an do to help with this? Who owns these things typically?

This is super easy to reproduce. You can just start with this:

And add tools in line 130 of main.py.
e.g:

        tools: [
          {
            name: "get_weather",
            description: "Get the weather at a given location",
            parameters: {
              type: "object",
              properties: {
                location: {
                  type: "string",
                  description: "Location to get the weather from",
                },
                scale: {
                  type: "string",
                  enum: ['celsius', 'farenheit']
                },
              },
              required: ["location", "scale"],
            },
          },
        ]

When running it you would immediately be faced with the pcm16 decoding issue and the empty tools returned by the openai api.

There’s an error if you copy the openai docs exactly

{
  type: "error",
  event_id: "event_AGEon4DCIF1cvdfEAqRBE",
  error: {
    type: "invalid_request_error",
    code: "missing_required_parameter",
    message: "Missing required parameter: 'session.tools[0].type'.",
    param: "session.tools[0].type",
    event_id: null,
  },
}

If you add

type: "function",

to the get_weather tool it will work.

1 Like

this is great @timaus how do you get the error?

I managed to get it working with Twilio, got a couple of functions running and getting results as expected. I noticed that I was not creating a new conversation once the function output was ready.

This solved it for me

openAiWs.send(JSON.stringify({
type: ā€˜conversation.item.create’,
item: {
type: ā€˜function_call_output’,
call_id: response.call_id,
output: JSON.stringify(function_call_output)
}

1 Like

Log all the messages from OpenAI. One of them is the error event which is being filtered out in the twilio example

If it helps i’ve created an example using cloudflare workers instead of Fastify. You can check my github @ timoconnellaus

I’m also experiencing this, but the model just fails to respond at all if I mount tools in my session; no matter what. I also noticed if I call update.session after connecting, server-side vad no longer works. Anyone find a workaround?

That’s awesome. Do you have the sample somewhere in github. I would like to give it a whirl as twilio has a basic sample. much appreciated.