Interruption not implemented out of the box in the Twilio Example

seagermack · October 4, 2024, 2:04am

I’m thrilled to start my implementation journey! As soon as I saw the update on X, I immediately checked out the Twilio tutorial.

I was able to get a POC up and running quickly!

However, the example doesn’t cover how to handle interrupts, such as interrupting the LLM’s audio output simply by speaking, similar to advanced voice mode. I’ve spent the last four hours working on my own solution (and I’ll keep at it), but I’d appreciate it if anyone could review the tutorial’s source code and suggest any approaches or modifications to make interrupts work. Thanks!

stevenic · October 4, 2024, 3:07am

The actual code for the realtime api gives a clue for this. It looks like the interruption doesn’t happen automatically. you get told that an interruption is occurring and then you have to actually cancel the current generation:

The realtime API has a cancelResponse() method that shows how to cancel the current generation:

github.com

openai/openai-realtime-api-beta/blob/37fefd76b66624aeb828865f2fc9e3204b2b11e0/lib/client.js#L613


      
            return true;
          }
          
          /**
           * Cancels the ongoing server generation and truncates ongoing generation, if applicable
           * If no id provided, will simply call `cancel_generation` command
           * @param {string} id The id of the message to cancel
           * @param {number} [sampleCount] The number of samples to truncate past for the ongoing generation
           * @returns {{item: (AssistantItemType | null)}}
           */
          cancelResponse(id, sampleCount = 0) {
            if (!id) {
              this.realtime.send('response.cancel');
              return { item: null };
            } else if (id) {
              const item = this.conversation.getItem(id);
              if (!item) {
                throw new Error(`Could not find item "${id}"`);
              }
              if (item.type !== 'message') {
                throw new Error(`Can only cancelResponse messages with type "message"`);

seagermack · October 4, 2024, 4:16am

I’ve attempted both client methods at various server event points, along with some custom in memory flags like userTalking:boolean but no dice. Will sleep on it and attempt first thing in the AM. Hunch is it MAY be a twilio limitation.(will look into their bidirectional streaming)

I appreciate your help.

beq · October 4, 2024, 7:53pm

When debugging, I noticed that the server’s audio response is in PCM16 format, which is different from what we initialized in the WebSocket g711_ulaw. Do you think this could be related to the issue?

seagermack · October 4, 2024, 10:47pm

have no idea my friend - but hey. I have all weekend to figure this out, but personally shocked Twilio would announce this partnership and fail to deliver on what feels like the base case.

anon25271712 · October 5, 2024, 1:25am

I’ve been busy the last couple of days, but back when tts-1 and whisper had been released, I built my own production ready conversation feature. While building it, I had to work with the threshold of what is considered silence.

this was challenging since someone in a noisy environment (lets say a cafe) wold seem to never stop talking. so there is certainly some ways to solve this, but they are not the easiest.

What I’m trying to get to, and on the playground you can test this, I think there is a parameter to the realtime endpoint for the threshold of silence. Modifying that might wield better results. But then again, I haven’t had the time to explore much given how much time I’ve been coding at work. Maybe if I’m not super tired on sunday I’ll check it out and be better informed.

beq · October 6, 2024, 6:47pm

finaly, it’s works perfectly :

case 'input_audio_buffer.speech_started':
        console.log('Speech Start:', response.type);
        twilioWs.send(
        JSON.stringify({
          streamSid: streamSid,
          event: 'clear',
        })
      );
      console.log('Cancelling AI speech from the server');
      const interruptMessage = {
          type: 'response.cancel'
      };
      openaiWs.send(JSON.stringify(interruptMessage));
    }

beq · October 6, 2024, 6:49pm

finaly, it’s works perfectly :

case 'input_audio_buffer.speech_started':
        console.log('Speech Start:', response.type);
        twilioWs.send(
        JSON.stringify({
          streamSid: streamSid,
          event: 'clear',
        })
      );
      console.log('Cancelling AI speech from the server');
      const interruptMessage = {
          type: 'response.cancel'
      };
      openaiWs.send(JSON.stringify(interruptMessage));
    }

Edit: You have to see this PR because you need to manage interrupt handling in both side - Twilio and OpenAI. When the user speaks and OpenAI sends input_audio_buffer.speech_started, the code in the PR clears the Twilio Media Streams buffer and sends conversation.item.truncate to OpenAI which is so important in this case."

melvinmt · October 6, 2024, 7:09pm

Amazing I was just trying to fix this as well. Here’s your code in Python:


if response['type'] == 'input_audio_buffer.speech_started':
    print('Speech Start:', response['type'])
    
    # Send clear event to Twilio
    await websocket.send_json({
        "streamSid": stream_sid,
        "event": "clear"
    })
    
    print('Cancelling AI speech from the server')
    
    # Send cancel message to OpenAI
    interrupt_message = {
        "type": "response.cancel"
    }
    await openai_ws.send(json.dumps(interrupt_message))

Works great!

hamzashams446 · October 6, 2024, 8:26pm

@seagermack
Amazing Find! Was stuck with this issue too after integrating custom RAG to realtime api.

kevin11 · October 7, 2024, 2:20am

were you able to figure out how to get the custom rag working with the realtime api / twilio project? i am itching to get that figured out for myself!

anthonywebsol · October 7, 2024, 7:02pm

Dude this worked, thank you! I’ve been trying to figure out how to interrupt twilio audio like this for a week now!

arunxtendai · October 8, 2024, 5:32am

May I ask where in the Twilio example code, would this need to be inserted? I assume it will be in the following section of the code ?

/ Listen for messages from the OpenAI WebSocket (and send to Twilio if necessary)
openAiWs.on(‘message’, (data) => {
try {
const response = JSON.parse(data);

            if (LOG_EVENT_TYPES.includes(response.type)) {
                console.log(`Received event: ${response.type}`, response);
            }

…
Thanks in advance.

NickA · October 8, 2024, 5:54am

Has anyone been able to figure out function calling in an example with Twilio because I have not. Adding function calling seems to keep breaking the script.

marketing5 · October 8, 2024, 8:23pm

Format it exactly like this:

  "tools": [
    {
      "type": "function",
      "name": "xxx",
      "description": "xxx",
      "parameters": {
        "type": "object",
        "properties": {
          "xxx": {
            "type": "string"
          }
        },
        "required": [
          "xxx"
        ]
      }
    }
  ]

sajjad_z · October 9, 2024, 12:58am

Hey. Is there is a reason you clear the Twillio steam before sending the cancel message to the Realtime API?

beq · October 9, 2024, 11:01am

The current implementation sends the entire audio from OpenAI to Twilio immediately, placing it in a queue for playback. As a result, there isn’t a way to cancel the playback during an interruption because the audio is already sent to Twilio. Clearing the Twilio stream before sending the cancel message to the Realtime API is necessary to ensure that any queued audio playback is stopped before the new instruction is processed, especially because the response from the OpenAI Realtime API is much faster than Twilio’s playback. by the way sending response.cancel is not required when using server_vad

skisquaw · October 13, 2024, 4:54am

I saw this too, not sure why it overrides ulaw to pcm

Topic		Replies	Views
Need help being able to interrupt the Realtime API response API realtime	19	5188	March 27, 2025
Interrupt realtime audio with text message - WebRTC API realtime	15	942	May 1, 2025
Unable to interrupt and stop model speaking API	5	276	February 24, 2025
[Realtime API] Audio is randomly cutting off at the end Bugs realtime	80	4869	May 3, 2025
Background Noise Interfering with Realtime API Using Phone API realtime	12	1806	February 20, 2025

Interruption not implemented out of the box in the Twilio Example

Related topics