Hi all
We’re building a real-time conversational platform using LiveKit for audio/video communication, and we’re exploring ways to integrate it with the OpenAI Assistants API to retain conversation context without manually storing chat history.
The challenge is connecting it all with LiveKit so that:
- The user speaks via audio in a LiveKit room
- We transcribe the audio using Whisper
- The transcript is sent to the Assistant via a Thread
- The assistant’s reply (text or TTS) is streamed back into LiveKit
- The assistant retains memory across the session
Is there any official guide, public example, or best-practice pattern for this kind of integration?
If there’s an alternate approach that supports persistent context with a conversational model in real-time audio, we’re open to ideas as well!
Thanks in advance
Would love to hear how others are solving this.