RAG with RealTime and Web Socket Relay (Push To Talk and VAD)

Good morning,

I’m trying to determine the best way to go about building a RAG solution with the new possibilities offered by realtime.

I’ve already implemented this in a Push To Talk way.
The problem is latency:

when “input_audio_buffer.commit” event is fired from client to relay, we have :
User Speech Input => WS Relay => Whisper to pass from speech to Text => Vector DB Search => Send to OpenAI API => User

This works correctly, but since we’re waiting to receive the user input before proceeding with the vector DB search, it causes additional latency.
Can you think of any other way of doing this quicker ?

In the case of VAD operation, this is even more complicated. Do you have any ideas for solutions?

Hey, can you share how you did RAG with Realtime API?

I would add an (un)necessary talking step in between, for example: “Let me just check my files…”

That’s what I usually do when I need to look something up while I’m on the phone.