Good morning,
I’m trying to determine the best way to go about building a RAG solution with the new possibilities offered by realtime.
I’ve already implemented this in a Push To Talk way.
The problem is latency:
when “input_audio_buffer.commit” event is fired from client to relay, we have :
User Speech Input => WS Relay => Whisper to pass from speech to Text => Vector DB Search => Send to OpenAI API => User
This works correctly, but since we’re waiting to receive the user input before proceeding with the vector DB search, it causes additional latency.
Can you think of any other way of doing this quicker ?
In the case of VAD operation, this is even more complicated. Do you have any ideas for solutions?