Realtime API does not trigger after "conversation.item.create" event

Hi everyone,

I’ve been experimenting with Function Calling using the Realtime API, and I’ve hit a bit of a snag. I’m trying to get the AI model to send audio back immediately after this snippet:

{
  type: "conversation.item.create",
  item: {
    type: "function_call_output",
    call_id: callId,
    output: myOutput
  },
}

Here’s what’s happening: if I say something simple like “hey,” it does trigger the model to generate a response. That response is based on my function_call_output, so the event itself is successfully sent.

The problem is that the response isn’t triggered as expected—it seems like the model only responds to voice input. I haven’t been able to find a way to force it to reply solely through events.

For reference, I didn’t set turn_detection to anything, just to avoid potential complications.

If anyone has insights or suggestions on how to tackle this, I’d really appreciate the help!

Thanks so much! :blush:

I figured it out! As someone else mentioned in another post, the solution is to send another event, right away:

type: 'response.create',
response: {
  instructions: "Reply based on the function's output."
}

It works, but still, I wonder why we really need to do this. If you have an explanation of the logic behind this, I am definitely interested. Thanks and I hope it will be hopeful for others :slight_smile:

1 Like

Hey! Can you help me?

I have the same issue, after the function call, the response.create get’s the API to respond to the user but then the conversation stops and the API stops talking.

Any suggestions?

I had this issue and figured out the response of function calling should be a stringifyied json, otherwise the ai wont start working even after the response.create event.

Solution Instruction:

  1. Put your function calling result into the conversation
  2. Trigger the model to continue
         // 1. put your function calling result into the conversation
        const response = {
          type: 'conversation.item.create',
          item: {
            type: 'function_call_output',
            call_id: output01.call_id,
            // stringify the function call first
            output: JSON.stringify(fnResponse),
          },
        };

        dataChannel?.send(JSON.stringify(response));
        
        // 2. Trigger the model to continue 
        const continueResponse = {
          type: 'response.create',
        };

        dataChannel?.send(JSON.stringify(continueResponse));
1 Like

Hey! Thanks for this.

I found a work about before this. I noticed that the conversation.item.create wasn’t completing for some reason and I’d get in progress in the item.created event from the API so, instead of the conversation.item.create, I am passing my function output as response.create itself and that works!

What I’ve learned from assistant-api - response.create should be called after all functions got executed as AI may call multiple tools or the same tool multiple times at the same run. Sending response.create right after tool response not always works as expected. But I also looking for the answer - which event I should use to send “response.create” after all tools called.

I’m facing the same issue. After sending the function call result to conversation.item.create, I receive the conversation.item.created event, but shortly after, I get the output_audio_buffer.stopped event. How can I resolve this?

`
// Define an object that contains multiple functions; methods in fns will be called
const fns = {
// Get the HTML content of the current page
getPageHTML: () => {
return {
success: true,
html: document.documentElement.outerHTML
}; // Return the entire page’s HTML
},

// Function to get the interview question
get_question: ({ question_number }) => {
    // List of 18 predefined questions
    console.log(`Fetching question for question_number ${question_number}`);
    const questions = [
        "What are the key features of Python as a programming language?",
        "How does Python's memory management work?",
        "Given a list of numbers, how would you write a Python function to find the second largest number?",
        "What is data engineering and how does it differ from data science?",
        "How do you ensure data quality in a data pipeline?",
        "You are tasked with designing a data pipeline for a new application. What steps would you take to ensure its efficiency and reliability?",
        "What is a data lake and how does it differ from a data warehouse?",
        "How do you manage data governance in a data lake environment?",
        "You have a data lake filled with raw data. How would you approach extracting meaningful insights from it?",
        "What inspired you to pursue a career in data science, and how does this role align with your long-term career goals?",
        "How do you define a successful team dynamic, and what role do you typically play in a team setting?",
        "Can you explain the difference between lists and tuples in Python?",
        "Can you explain the concept of decorators in Python and provide an example of their use?",
        "What are the key differences between structured, semi-structured, and unstructured data?",
        "Can you describe a time when you had to adapt to a significant change in your work environment? How did you handle it?",
        "What values do you believe are essential for fostering a positive workplace culture, and how do they align with Company's values?",
        "How would you address performance issues when querying large datasets in a data lake?",
        "What are the best practices for organizing data within a data lake?"
    ];

    if (1 <= question_number && question_number <= questions.length) {
        return questions[question_number - 1];
    } else {
        return null;
    }
}

};

// When an audio stream is received, add it to the page and play it
function handleTrack(event) {
const el = document.createElement(‘audio’); // Create an audio element
el.srcObject = event.streams[0]; // Set the audio stream as the element’s source
el.autoplay = el.controls = true; // Autoplay and display audio controls
document.body.appendChild(el); // Add the audio element to the page
}

// Create a data channel for transmitting control messages (such as function calls)
// Create a data channel for transmitting control messages (such as function calls)
function createDataChannel() {
// Create a data channel named ‘response’
dataChannel = peerConnection.createDataChannel(‘response’);
console.log(‘Creating data channel: response’);

// Configure data channel events
dataChannel.addEventListener('open', () => {
    console.log('Data channel opened');
    configureData(); // Configure data channel functions
});

// Handle incoming messages on the data channel
dataChannel.addEventListener('message', async (ev) => {
    console.log('Message received on data channel:', ev.data); // Log the received message

    const msg = JSON.parse(ev.data); // Parse the received message
    console.log('Parsed message:', msg); // Log the parsed message

    // Check if the message type indicates a function call request
    if (msg.type === 'response.function_call_arguments.done') {
        const fn = fns[msg.name]; // Get the corresponding function by name
        if (fn !== undefined) {
            console.log(`Calling local function ${msg.name}, parameters ${msg.arguments}`);
            try {
                const args = JSON.parse(msg.arguments); // Parse function parameters
                console.log('Function arguments:', args); // Log parsed arguments

                const result = await fn(args); // Call the local function and wait for the result
                console.log('Function result:', result); // Log the function result

                // Prepare the result to be sent
                const event = {
                    type: 'conversation.item.create', // Event type for creating a conversation item
                    item: {
                        type: 'function_call_output', // Specify the output type
                        call_id: msg.call_id, // Passed call_id from the original message
                        output: JSON.stringify(result), // JSON string of the function execution result
                    },
                };
                
                // Log the event that will be sent
                console.log('Sending result back via data channel:', event);
                dataChannel.send(JSON.stringify(event)); // Send the result back to the remote side
            } catch (error) {
                console.error('Error while calling function:', error); // Log any errors
            }
        } else {
            console.warn(`Function ${msg.name} not found!`); // Log a warning if the function is not found
        }
    } else {
        console.warn('Received unsupported message type:', msg.type); // Log unsupported message types
    }
});

// Handle the data channel being closed
dataChannel.addEventListener('close', () => {
    console.log('Data channel closed');
});

// Handle data channel errors
dataChannel.addEventListener('error', (error) => {
    console.error('Data channel error:', error);
});

}

// Configure data channel functions and tools
function configureData() {
console.log(‘Configuring data channel’);
const event = {
type: ‘session.update’, // Session update event
session: {
modalities: [‘text’, ‘audio’], // Supported interaction modes: text and audio
// Provide functional tools, pay attention to the names of these tools corresponding to the keys in the above fns object
tools: [
{
type: ‘function’,
name: ‘getPageHTML’,
description: ‘Get the HTML content of the current page’,
},
{
type: ‘function’,
name: ‘get_question’, // Include the get_question function
description: ‘Get question for the interview using question number’,
parameters: {
type: ‘object’,
properties: {
question_number: { type: ‘number’ }
},
required: [‘question_number’]
}
}
],
},
};
dataChannel.send(JSON.stringify(event)); // Send the configured event data
}

// Get the control button element
const toggleButton = document.getElementById(‘toggleWebRTCButton’);
// Add a click event listener to the button to toggle the WebRTC connection state
toggleButton.addEventListener(‘click’, () => {
// If WebRTC is active, stop the connection; otherwise, start WebRTC
if (isWebRTCActive) {
stopWebRTC(); // Stop WebRTC
toggleButton.textContent = ‘start’; // Update button text
} else {

    startWebRTC(); // Start WebRTC
    toggleButton.textContent = 'stop'; // Update button text
}

});

let micStream = null; // Store the microphone stream globally
let isMicEnabled = true; // Track the mic state

// Function to toggle the microphone state
// Toggle the microphone on and off
function toggleMic() {

if (micStream) {
    micStream.getTracks().forEach((track) => {
        track.enabled = !track.enabled;
    });
    isMicEnabled = !isMicEnabled;
    toggleMicButton.textContent = isMicEnabled ? "Mute Mic" : "Unmute Mic";
    console.log(isMicEnabled ? "Microphone enabled" : "Microphone muted");
}

}

// Start WebRTC connection
function startWebRTC() {
if (isWebRTCActive) return;

peerConnection = new RTCPeerConnection();
peerConnection.ontrack = handleTrack;
createDataChannel();

// Get audio stream from the microphone
navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
    micStream = stream; // Store the stream globally
    stream.getTracks().forEach((track) => {
        track.enabled = true; // Ensure the mic is initially enabled
        peerConnection.addTransceiver(track, { direction: 'sendrecv' });
    });

    // Create an offer and send it to the backend
    peerConnection.createOffer().then((offer) => {
        peerConnection.setLocalDescription(offer);
        fetch(baseUrl + '/api/rtc-connect', {
            method: 'POST',
            body: offer.sdp,
            headers: {
                'Content-Type': 'application/sdp',
            },
        })
        .then((r) => r.text())
        .then((answer) => {
            peerConnection.setRemoteDescription({ sdp: answer, type: 'answer' });
        });
    });
});

isWebRTCActive = true;

}

// Add a click event listener to the mic toggle button
const toggleMicButton = document.getElementById(‘toggleMicButton’);
toggleMicButton.addEventListener(‘click’, toggleMic);

// Stop the WebRTC connection and clean up all resources
function stopWebRTC() {
// If WebRTC is not active, return directly
if (!isWebRTCActive) return;
// Stop the received audio tracks
const tracks = peerConnection.getReceivers().map(receiver => receiver.track);
tracks.forEach(track => track.stop());
// Close the data channel and WebRTC connection
if (dataChannel) dataChannel.close();
if (peerConnection) peerConnection.close();
// Reset connection and channel objects
peerConnection = null;
dataChannel = null;
// Mark WebRTC as not active
isWebRTCActive = false;
}`