How to implement a real-time flow (voice/text) + vector store (file_search) for fluid audio+text responses?

Andrea_Arredondo · November 25, 2025, 6:16pm

Hello community,

I am developing a chatbot that requires the following functionality: accept voice or text input, use a file search tool (vector store/uploaded documents) for context retrieval, and generate a fluid response in text and audio format (ideally with streaming).

I have successfully implemented this using the text API (Responses API) and the vector store for text-to-text flow.

However, I cannot use the “realtime” audio model (which supports voice/text input and output) in combination with the vector store, and when I try to transform the text stream generated by a model and pass it to a tts endpoint, I have noticed loss of audio segments, high latency, and a less fluid (unnatural) experience.

I would like to know if anyone has been able to implement this flow: “Voice/Text input → Vector Store → Audio+Text output.”

Since this functionality is not available natively, what would be the recommended structure to achieve the smoothest possible output audio while still being able to use OpenAI’s file search? I would also like to know, in your experience, which parameters, models, or audio formats have worked best for natural, low-latency voice reproduction?

I would appreciate any code examples and/or recommendations from those who have worked on similar implementations.

Thank you very much for your help!

Topic		Replies	Views
Getting realtime to use my dataset for responses API vector-store , realtime	6	990	March 29, 2025
Realtime API connection to vector store (similar to Assistants API) API assistants-api	5	1237	March 19, 2025
What is the best practice for making OpenAI realtime (voice) database queries? API	0	237	June 8, 2025
Does voice agents realtime api supports the vector stores or file from platform open ai? API assistants-api , realtime , api-realtime-speech	2	384	September 30, 2025
Realtime API and PDF integeration API	1	638	October 5, 2024

How to implement a real-time flow (voice/text) + vector store (file_search) for fluid audio+text responses?

Related topics