RAG with Realtime API - samples / gudelines / best practices?

Any samples / gudelines / best practices so far on how to do RAG efficiently with Realtime API ?
Interested in keeping latency low. Cost is important factor of course.

1 Like

You’ll want to use tools. I don’t have an example yet but given the cost of the RT API you want to keep your main voice conversation as small as possible. Tools seems like the key to bridging over to RAG in a way that’s cost effective.

1 Like

can you mention any specific tool that might be helpful here?
I am stuck at the same problem

1 Like

You can give it a tool called “search(query)” that does the rag parts. That tool can make another model call and return an answer that the assistant will read back.

2 Likes

Can I request a sample code from you?