Adding voice (Whisper) input support to ChatKit

8coins · October 18, 2025, 3:02am

When using the ChatKit embed, is there a recommended way to capture microphone input,
transcribe the speech with Whisper, and feed that text into the same ChatKit conversation?

Or is native audio / microphone input planned for a future release of the embedded ChatKit?

mcfinley · October 18, 2025, 10:51am

I can’t speak to roadmap but my approach for this is to open a gpt-realtime conversation (have done websockets from py or webRTC from ts) to capture low-latency transcription and pipe it to my chatkit window as typed text. My chatkit agent has a reply field that I push back to realtime as text so the realtime model stays informed about what’s happening. Not counting on the realtime model to do any tool calling or reasoning on the reply other than to keep the user busy.

Bottom line… realtime does the low-latency audio in a super simple setup, chatkit is a smarter back-end that does the work. sort of a Cyrano de Bergerac setup. Turn detection is tricky and I don’t have it all nailed down but its much better than trying to detect audio turns just to do STT and TTS in and out of chatkit.

8coins · October 18, 2025, 2:39pm

Thank you for your quick response!

Topic		Replies	Views
Can speech-to-text be integrated in chatkit? API chatkit	1	108	January 7, 2026
Web Speech API with whisper API whisper	1	633	July 24, 2025
Transcribe via Whisper in real-time / live API whisper	4	35734	February 6, 2024
Speech to Text (Whisper) to Review (ChatGPT) API whisper	1	2306	October 4, 2023
How to Use Assistants API (Threads) in Real-Time Audio with LiveKit? Community assistants-api	0	121	May 15, 2025

Adding voice (Whisper) input support to ChatKit

Related topics