Any samples / gudelines / best practices so far on how to do RAG efficiently with Realtime API ?
Interested in keeping latency low. Cost is important factor of course.
You’ll want to use tools. I don’t have an example yet but given the cost of the RT API you want to keep your main voice conversation as small as possible. Tools seems like the key to bridging over to RAG in a way that’s cost effective.
can you mention any specific tool that might be helpful here?
I am stuck at the same problem
You can give it a tool called “search(query)” that does the rag parts. That tool can make another model call and return an answer that the assistant will read back.
Can I request a sample code from you?
I open sourced this : GitHub - adorosario/openai-realtime-with-customgpt-poc: POC Using OpenAI Realtime API with CustomGPT for RAG And Twilio Voice
You can rip out the CustomGPT .ai RAG if you want and replace it with whatever endpoint your RAG is going to be.
Latency is going to be an issue for sure (especially if your RAG is not super fast) – had to implement a UX “typing” sound (similar to “…” GIF in text chatbots - LOL!)