WebRTC full manual control use case

dested · December 19, 2024, 1:56am

I am trying to use the new Webrtc APIs. I do not want the server to respond themselves, instead I want to get a transcript of what the user said, process it myself, and then tell the server what to say back to the user. Is this currently possible? I have tried

          turn_detection: {
            type: 'server_vad',
            threshold: 0.5,
            prefix_padding_ms: 300,
            silence_duration_ms: 500,
            create_response: false,
          }

But it has no effect that I can see, and also when I get the ‘session.created’ data payload, it sets it create_response to true.

If this usecase is not possible, how would you handle this flow?

saint-entity · December 24, 2024, 2:12pm

Use deepgram for faster and more accurate transcriptions and get rid of the threshold, silence duration and prefix padding as these necessitate response activation through input audio.

kalaidastudios · January 7, 2025, 3:12am

Hey! We are running into the same exact issue. Did you ever find a work around or how to fix it?

dested · January 7, 2025, 3:32am

Wish I had better news, but no. I have not gone back to it yet, but my intention is to flirt with Deepgram until these things get ironed out. Good luck!

Topic		Replies	Views
VAD with WebRTC Realtime voice API realtime	3	475	January 7, 2025
Retrieving user response from Realtime Voice WebRTC API api	14	530	January 11, 2025
Real Time API invents numbers or does not understand number sequences Bugs	11	166	January 22, 2025
Realtime webrtc api not taking audio input Bugs realtime	10	223	January 16, 2025
Can WebRTC Be Used for a Real-Time Text-to-Text Chatbot Instead of WebSockets? API	3	123	March 8, 2025

WebRTC full manual control use case

Related topics