I am trying to use the new Webrtc APIs. I do not want the server to respond themselves, instead I want to get a transcript of what the user said, process it myself, and then tell the server what to say back to the user. Is this currently possible? I have tried
turn_detection: {
type: 'server_vad',
threshold: 0.5,
prefix_padding_ms: 300,
silence_duration_ms: 500,
create_response: false,
}
But it has no effect that I can see, and also when I get the ‘session.created’ data payload, it sets it create_response to true.
If this usecase is not possible, how would you handle this flow?