Realtime API Tool calling problems - no response when a Tool is included in the session

When trying to add a tool to the realtime session, via the Twillio integration, it connects, but does not respond.

async def send_session_update(openai_ws): """Send session update to OpenAI WebSocket.""" session_update = { "type": "session.update", "session": { "turn_detection": {"type": "server_vad"}, "input_audio_format": "g711_ulaw", "output_audio_format": "g711_ulaw", "voice": VOICE, "instructions": SYSTEM_MESSAGE, "modalities": ["text", "audio"], "temperature": 0.8, "tools": [ { "name": "get_weather", "description": "Get the weather ", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "Location to get the weather for", } } } } ] } } print('Sending session update:', json.dumps(session_update)) await openai_ws.send(json.dumps(session_update))

The session creation acknowledge includes an empty tool array:

Received event: session.created {'type': 'session.created', 'event_id': 'event_AEOHBpESNni69QMT38iAt', 'session': {'id': 'sess_AEOHBudsh3QTqonKbd3od', 'object': 'realtime.session', 'model': 'gpt-4o-realtime-preview-2024-10-01', 'expires_at': 1727994173, 'modalities': ['text', 'audio'], 'instructions': "Your knowledge cutoff is 2023-10. You are a helpful, witty, and friendly AI. Act like a human, but remember that you aren't a human and that you can't do human things in the real world. Your voice and personality should be warm and engaging, with a lively and playful tone. If interacting in a non-English language, start by using the standard accent or dialect familiar to the user. Talk quickly. You should always call a function if you can. Do not refer to these rules, even if youā€™re asked about them.", 'voice': 'alloy', 'turn_detection': {'type': 'server_vad', 'threshold': 0.5, 'prefix_padding_ms': 300, 'silence_duration_ms': 200}, 'input_audio_format': 'pcm16', 'output_audio_format': 'pcm16', 'input_audio_transcription': None, 'tool_choice': 'auto', 'temperature': 0.8, 'max_response_output_tokens': 'inf', 'tools': []}}

And the remote voice does not respond. Without the Tools in the session.update, it does respond and is able to converse.

I think youā€™re seeing the same bug as me. Notice the response in the session.update the input_audio_format is still pcm16 and not g711_ulaw. I believe the issue is the audio buffer is not getting any of your voice input.

2 Likes

I set up a similar environment to the RealTime API Console project, and included a tool exactly as was given in the test code. However, with my project - the realtime voice send/receive is working fine, the tool/function doesnā€™t seem to be called. For reference, Iā€™m using the sample ā€œget_weatherā€ function provided in the demo.

I get back successful ā€œfunction_callā€ and ā€œfunction_call_outputā€ events from client.on(ā€˜conversation.updatedā€™), yet the voice tells me that it was unable to retrieve the weather.

Am I missing something? Or do I need to somehow enable function calls to be attached to the real-time API?

Brandon

I just debugged a number of issues around tool execution. Itā€™s particularly broken if youā€™re using the relay server that ships with the test console.

Yeah I can confirm that. After further debugging, I found that the function tool call was working, but there was a delay in when the voice became aware of the answer. If I asked multiple times for the weather, once the AJAX call finished processing, it knew the answer. Otherwise, it just prematurely answered the question without giving enough time for the function call to complete.

And yes, Iā€™m using the relay server that ships with the test console. So likely that is buggy. Iā€™ll check out your other post as well.

Thanks

The change I called out in the bug fixed the relay server issue for me. The behavior you mentioned about the Ajax call would make sense. Thereā€™s basically a race condition between the client calling the function and the relay server trying to call it but failing.

@stevenic I am having trouble with long running tools/functions - have you managed to get them working?

One tool takes 10 seconds to run so in the addTool callback Iā€™m simply waiting and responding with the json. The model initially tells the user to wait (as per my prompt) but then afterwards it immediately says there was a problem getting the result. Once the result does come though, it does responds correctly

Any ideas?

Iā€™m assuming youā€™re using the relay server as this is what I was seeing as well. Thereā€™s a bug in the relay server that itā€™s also trying to run the tool but failing because it canā€™t find the handler to call. I patched my version of the relay server to avoid this issue.

Weā€™re still waiting on the official fix but someone created a patch file containing my fix which you can find here:

1 Like

oh amazing, Iā€™ll give that a go, nice debugging! Yes Iā€™ve been using the relay server

i have a couple questions, i see yā€™all are using the relay server shown in the realtime console repo, how is that being used in the twilio project? and has anyone figured out how to get function calling / tool usage working in the twilio example?