Here is what i have achieved-
I am setting up a voice bot on a real call. I am able to answer the call, set up a websocket and receive response from realtime API (“Hello, how can i assist you”)
Next I am able to ask anything & get a response from API to be played on the call itself. This can go as long as i want.
But I am unable to interrupt the api response audio playback on the call. My understanding is that as and when i speak on the call, speech_started should get triggered and I can cancel bot’s response but that only gets triggered when bot is not playing audio.
What am i missing?
session update-
@Override
public void onOpen(WebSocket webSocket) {
System.out.println(“Connected to WebSocket.”);
webSocket.request(1);
webSocket.sendText(new JSONObject()
.put(“type”, “response.create”)
.put(“response”, new JSONObject()
.put(“modalities”, new JSONArray().put(“text”).put(“audio”))
.put(“instructions”, “Assist the user.'”)
).toString(), true);
webSocket.sendText(new JSONObject()
.put(“type”, “session.update”)
.put(“turn_detection”, new JSONObject()
.put(“type”, “server_vad”)
.put(“threshold”, 0.5)
.put(“prefix_padding_ms”, 300)
.put(“silence_duration_ms”, 500))
.put(“input_audio_transcription”, “whisper-1”)
.toString(), true);
}
server events-
@Override
public CompletionStage<?> onText(WebSocket webSocket, CharSequence data, boolean last) {
messageBuffer.append(data);
if (last) {
try {
JSONObject event = new JSONObject(messageBuffer.toString());
if (event.has(“type”)) {
String eventType = event.getString(“type”);
switch (eventType) {
case “session.created”:
System.out.println("Session created: " + event);
break;
case "input_audio_buffer.speech_started":
System.out.println("isInterrupted3 : " + isInterrupted);
synchronized (AzureAgiScript.class) {
isInterrupted = true;
System.out.println("isInterrupted3 : " + isInterrupted);
}
System.out.println("Caller started speaking, interrupting playback, if any.");
break;
case "response.audio.delta":
System.out.println("isInterrupted4 : " + isInterrupted);
synchronized (AzureAgiScript.class) {
if (isInterrupted) {
System.out.println("Audio delta interrupted by caller input.");
break;
}
}
String base64Audio = event.getString("delta");
byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
String slnFilePath = "/astrisk_jar_setup/audio.sln24";
try (FileOutputStream fos = new FileOutputStream(slnFilePath)) {
fos.write(audioBytes);
}
channel.exec("Playback", slnFilePath.replace(".sln24", ""));
System.out.println("Audio delta processed");
break;
case "response.audio.done":
isInterrupted = true;
System.out.println("isInterrupted5 : " + isInterrupted);
System.out.println("Audio response done.");
break;
case "input_audio_buffer.speech_stopped":
synchronized (AzureAgiScript.class) {
isInterrupted = false;
}
System.out.println("Speech stopped.");
isInterrupted = false;
System.out.println("isInterrupted6 : " + isInterrupted);
break;
}