Random Time Slots in Realtime Function Call

I’m playing with the realtime API with the goal of returning available appointment slots based on a function call.

Here’s an walkthrough of how the responses are being fired;

{“type”=>“response.output_item.done”, “event_id”=>“event_ID”, “response_id”=>“resp_ID”, “output_index”=>0, “item”=>{“id”=>“item_ID”, “object”=>“realtime.item”, “type”=>“message”, “status”=>“completed”, “role”=>“assistant”, “content”=>[{“type”=>“audio”, “transcript”=>“Certainly! Bear with me one moment while I check the availability for you at the X clinic on Thursday, February 6th, for the Y treatment”}]}}

{“type”=>“response.function_call_arguments.done”, “event_id”=>“event_ID”, “response_id”=>“resp_ID”, “item_id”=>“item_ID”, “output_index”=>1, “call_id”=>“call_ID_ending_in_7Ud4”, “name”=>“get_available_appointment_slots”, “arguments”=>“{"clinic_id":"INTERNAL_CLINIC_ID","service_id":"INTERNAL_SERVICE_ID","from_date":"2025-02-06","to_date":"2025-02-06"}”}

The function call returns:

{:type=>“conversation.item.create”, :item=>{:type=>“function_call_output”, :call_id=>“call_ID_ending_in_7Ud4”, :output=>“{"success":true,"data":{"clinic_id":"INTERNAL_CLINIC_ID-hg","service_id":"INTERNAL_SERVICE_ID","appointment_slots":[{"date":"2025-02-06","appointment_slots":[{"time":"3:20pm","provider_id":"INTERNAL_PROVIDER_ID","provider_name":"Provider "},{"time":"4:35pm","provider_id":"INTERNAL_PROVIDER_ID","provider_name":"Provider "}]}]},"message":"Available appointment slots retrieved successfully. Do not include the provider_id in your response."}”}}

We then get:

{“type”=>“response.output_item.done”, “event_id”=>“event_ID”, “response_id”=>“resp_ID”, “output_index”=>0, “item”=>{“id”=>“item_ID”, “object”=>“realtime.item”, “type”=>“message”, “status”=>“completed”, “role”=>“assistant”, “content”=>[{“type”=>“audio”, “transcript”=>“I found availability for you on Thursday, February 6th at our Douglas clinic for the anti-wrinkle treatment. We have a slot open at 10:00 am and another at 11:30 am. Would you like to book one of these times?”}]}}

10 am and 11:30 am are not in the list. I’ve also tried switching to 24 hour etc. No dice. Anyone else hit this problem?

This is likely an issue in having the model switch between text and voice, which to the AI, is almost like speaking a different language. It is not repeating back the same tokens, it is producing audio.

Also, there is limited control of temperature, and if you could send temperature=0, it would not always produce the best response, but instead makes the audio invalid most of the time for some reason.

So along with the inability for the AI to successfully employ data like phone numbers, I would shy away from this type of application, or use extensive language model supervision and validation in text mode, even “did the AI say something that made sense” before the person hears their response.

Thanks @_j - It’s certainly strange. What I did as a test was to use the Realtime playground to emulate the same interaction. It works. Everytime. No issue.

This would normally point me to an issue with how I’m passing data back and forth to the API but I cannot spot an issue after several days.

I appreciate the help to diagnose!

You could verify that you are sending 24kHz 16-bit mono audio to the API, so the speaker isn’t being heard with a Mickey Mouse voice or at half-speed.

Also, you could have the application use a “configure mic” setting or setup wizard per user, so that they can see that they are setting the input recording device to the correct level. Then do some limiting and compression, or simply normalization of the whole turn if you are sending only after your own VAD.

However it is the text input that needs to be followed. I can only suggest to make the tool return repetitive, structured, clear, and insistent and mandatory to repeat back what appears “in quotes”.


For the output data where the AI is actually saying things that do not align with the context provided, the only other suggestion I can think of is to more closely monitor the conversation history and manage the turns.

Chat Completions, after you are doing all your own audio work, also has access to voice models. You can send an entire input of your own design, and chats can last for an extended time, including up to one hour of chat history of sending back assistant messages before past exchanges of input and response must be discarded.