RealtimeAPI: WebRTC (Client) + WebSocket (Server) possible?

gramiro · February 20, 2025, 11:28pm

Hi! I’m currently working on a project where there’s a need to have the client (browser) connecting through WebRTC to RealtimeAPI and the backend to connect with WebSockets using the same sessionId to perform operations as the user talks in the client.

Is this something that can be done? I already managed to connect the client with WebRTC and the server with WS to the same sessionId as the client but the WS never gets events for “message” rather than the “session.created” one.

mat.eo · February 20, 2025, 11:42pm

When you say server do you mean signaling server? Or whatever is hosting the AI as a client?

gramiro · February 20, 2025, 11:44pm

Hey mat.eo!

I mean an endpoint in the cloud that takes a sessionId param and stablishes a WS connection with that sessionId with Realtime.

So, the user will have the conversation in the browswer through WebRTC and the server in the cloud will be performing tasks according to the text transcripts from the WS connection to the same session.

j.wischnat · February 21, 2025, 6:48am

I think you mean a relay server? This is absolutely possible. I’ve done this with my project (Though it is WSS (Client) - WSS (Relay Server) - WSS (OpenAI), I don’t use WebRTC but it should be possible as well).
You just pass all the raw base64 encoded audio frames to the relay server which then sends it to the OpenAI websocket.

Cheers!

beatlz.too · February 21, 2025, 3:09pm

I think this is about listening to the events from a connection between the user and the Realtime API service from the backend. This makes sense because, if frontend has the session token and with this token they can update the settings of the object created from backend, then they could abuse it without backend noticing it, no?

mat.eo · February 21, 2025, 4:23pm

Agreed. A relay server would be the only option. You’d need a “dummy” client that sits in the WebRTC channel to collect and distribute all the packets of the data.

beatlz.too · February 22, 2025, 9:33am

There’s something I don’t understand, if the server can never listen to the events of a session it created, wouldn’t this open vulnerabilities? For example:

Client requests a session_id from server
Server uses API key to create the session with a set of parameters
Gives it back to client
Client uses this session_id to abuse the session (e.g. increases the max tokens allowed and used model)
Server cannot know this because there’s no way to listen to the session

I must be missing something

mat.eo · February 22, 2025, 5:05pm

As far as I understand it, it technically would be listening.

All the information would be passed from the dummy client in the WebRTC channel to the WebSocket server. I’m not sure how plausible this whole scenario is (latency wise), but it’s an option

beatlz.too · February 22, 2025, 7:05pm

If there’s this dummy RTC, it’d be on backend and frontend would talk to this, no?

It would double the latency in theory. I don’t know if this is something that’s noticeable by the user, but it is twice as many events.

I feel like the obvious way to do this is to let frontend have a direct connection with the Realtime API with the session token, but restricting it from changing the settings or at least letting backend know if the session has been updated.

I don’t understand the point of having a session token pattern if we still have to connect backend directly, we could just use the API Auth token. That’s why I think I might be missing something.

mat.eo · February 22, 2025, 8:55pm

There is no traditional “back-end” for WebRTC besides usually server(s) that facilitate connection and sometimes media transfer. That’s the purpose of it: direct p2p communications. The OpenAI libraries (AFAIK) abstract away all the complexities of handling these protocols.

I’m approaching this question purely from a WebRTC → Websocket perspective, and not including the libraries that make it easier to manage, maybe that’s where the disconnect is happening?

I’m gathering that your reference to the “back-end” is the OpenAI server.

icdev2dev · February 22, 2025, 10:05pm

There is a expiration time associated with the ephermeral key. This means the user cannot run-away (forever) with examining the source in browser.

The fact that one cannot read whether the session has been updated or not kind of sucks. For the moment then, you can offer this service to someone you trust. After some time (weeks/months), I am relatively sure that OpenAI will give read level api access

beatlz.too · February 23, 2025, 8:49pm

There is no traditional “back-end” for WebRTC

I understand this, but traditional backend creates the session token (ephemeral key) and serves it to the frontend to connect to OpenAI’s WebRTC.

Sorry if I was not being clear. The way that I picture a real service using this is something like

User wants to have a voice conversation in frontend
Backend calls OpenAI’s POST and gets the sessionID with some settings
Hands it back to the frontend
Frontend connects directly to OpenAI’s Realtime API using the session token to establish a WebRTC connection

Then, the traditional backend that owns the OpenAI’s auth key doesn’t know how frontend is using this. I get it for privacy reasons, but I’d expect for backend to be able to tell if the settings of the session object are being changed, which is possible according to the docs. They could change things like that max output tokens, the instructions, model, and some other stuff. Is this less sensitive than I think?

beatlz.too · February 23, 2025, 8:51pm

Yeah, that’s my read as well. Thanks for taking the time to read and reply : )

Topic		Replies	Views
Realtime API WebRTC best security & use practices API realtime	10	1248	September 30, 2025
WebRTC - can anyone read our prompts from the JS code? API realtime , api-realtime , api-realtime-speech	7	650	December 24, 2025
Can WebRTC Be Used for a Real-Time Text-to-Text Chatbot Instead of WebSockets? API	8	860	September 22, 2025
Realtime api - get conversation transcription from backend while using webrtc API realtime	2	499	February 3, 2025
Realtime API: how to track consumption per user with a direct WebRTC communication? API realtime	0	330	August 18, 2025

RealtimeAPI: WebRTC (Client) + WebSocket (Server) possible?

Related topics