I built a TypeScript wrapper + simple client to make it easier for frontend developers to use WebRTC for real-time AI apps.
I’ve been exploring WebRTC for over a year, but with OpenAI’s new semantic voice detection, it opens up a whole new level of accessibility and UX. Unlike traditional VAD, it actually understands when you mean to speak — no more background noise messing things up.
Docs: https://platform.openai.com/docs/guides/realtime-vad
This wrapper gives you full typed control over client/server events and UI state. Here’s the shape of the core config:
export interface RealtimeClientConfig {
clientSecret: string;
model?: string;
realtimeUrl: string;
dataChannelLabel?: string;
sessionType?: SessionType;
onMessageToken?: (token: string) => void;
onConnectionStateChange?: (state: ConnectionState) => void;
onError?: (error: Error) => void;
onConversationItemCreated?: (item: Item) => void;
onResponseCreated?: (response: Response) => void;
onResponseDone?: (response: Response) => void;
onSpeechStarted?: () => void;
onSpeechStopped?: () => void;
onRawEvent?: (event: ServerEvent) => void;
onUserTranscriptDelta?: (text: string) => void;
onUserTranscriptDone?: (text: string) => void;
onAssistantTranscriptDelta?: (text: string) => void;
onAssistantTranscriptDone?: (text: string) => void;
onTranscriptionError?: (error: Error) => void;
}
If you’re building something voice-first or real-time, I’d love to hear your thoughts or see what you’re building!
github/mostafa-drz/openai-realtime-webrtc