RAG with RealTime and Web Socket Relay (Push To Talk and VAD)

boulangerromain · October 15, 2024, 12:39pm

Good morning,

I’m trying to determine the best way to go about building a RAG solution with the new possibilities offered by realtime.

I’ve already implemented this in a Push To Talk way.
The problem is latency:

when “input_audio_buffer.commit” event is fired from client to relay, we have :
User Speech Input => WS Relay => Whisper to pass from speech to Text => Vector DB Search => Send to OpenAI API => User

This works correctly, but since we’re waiting to receive the user input before proceeding with the vector DB search, it causes additional latency.
Can you think of any other way of doing this quicker ?

In the case of VAD operation, this is even more complicated. Do you have any ideas for solutions?

shrey14 · October 15, 2024, 7:40pm

Hey, can you share how you did RAG with Realtime API?

vb · October 15, 2024, 7:45pm

I would add an (un)necessary talking step in between, for example: “Let me just check my files…”

That’s what I usually do when I need to look something up while I’m on the phone.

Topic		Replies	Views
RAG with Realtime API - samples / gudelines / best practices? API realtime	5	3112	November 5, 2024
RAG with voice-voice(end-end) RealTime API API api	17	7854	January 19, 2025
Title: Use RAG with Real-Time API for Call Tool API api-realtime	0	450	February 27, 2025
Two realtime voice agent communication pattern API api-realtime-speech	4	350	October 3, 2025
What is the best practice for making OpenAI realtime (voice) database queries? API	0	244	June 8, 2025

RAG with RealTime and Web Socket Relay (Push To Talk and VAD)

Related topics