Help needed: Setting up a voice-interactive RAG system with Realtime API and ChromaDB

Hello,
I would like to use the realtime API in Python and implement RAG (Retrieval-Augmented Generation), but I’m having trouble understanding how to approach this.
I want the API to greet me with a message saying “Hello, what would you like to know?”, then for example, if I ask about tonight’s guest list, which would be stored in a ChromaDB vector database, the API should call a search function to retrieve the results and respond to me verbally.
Do I need to use speech-to-text technology like Whisper for this?
Thank you

Your best bet is function calling which is actually intended for these kinds of things. I personally haven’t implemented such a system yet, but the model is capable of deciding when to call a function (provided sufficient prompting and proper function definition), which would enable you to extract the question from the conversation and then you are free to do anything you want.

For example, build a query and perform vector search, then take the search result and send it back to the realtime API as a response to the function call

I do not see how Whisper would help you in any way with this, since we are talking about a realtime API which is already multi-modal

Check this : https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/voicerag-an-app-pattern-for-rag-voice-using-azure-ai-search-and/ba-p/4259116

Thanks

I’m using function calling. It launches a search into a chromadb collection
I can actually obtain results but i don’t understand how to send those results to the api so that it can answer the user from the retreived data
Is there an API doc that explains exactly how to proceed in python?
Thanks

You do conversation.item.create with whatever content you want to feed the model with, and then just response.create. Simple as that

Edit: probably should be as a response to the function call the model called earlier, but could also be just a standalone response. Experimentation required