Realtime API WebRTC best security & use practices

Hello there!

I’m building a project with the use of the OpenAI Realtime API via WebRTC, and it made me wonder about the security practices here.

Based on the official docs, the flow is the following:

  • UI → Server: request ephemeral key
  • Server → OpenAI: https://api.openai.com/v1/realtime/sessions creates a session and receives the ephemeral key back alongside session params like ID, etc.
  • Server → UI: returns ephemeral key
  • UI → OpenAI: https://api.openai.com/v1/realtime?model=${model} begins a session with the use of the ephemeral key
  • UI → OpenAI via WebRTC: established a peer connection and proceeds with the conversation

Once the conversation is over, the UI closes the peer connection, so this way the Realtime session is finished.

There are a few things that worry me:

  • there is no way for the Server to stop the conversation by sending a request with the session id to OpenAI. This is useful, if we want to restrict the duration of the conversation server-side (let’s say, based on the user “credits”), as client-side restrictions are easy to bypass
  • there is no way to get the session metadata (like duration, costs, status, etc.) from the Server by session id. This is useful to prevent the misuse of the conversations, which is easy to achieve by sniffing the ephemeral key that comes from the Server, and using it outside the UI app. Also, if the service charges their users based on the conversation duration / used tokens, there is no way to learn the exact numbers for the particular session, which opens doors for the misuse.

I understand that it is possible to fix that by using WebSockets on the server side, but it brings its own implementation complexity, and additional network traffic costs, as cloud providers charge for the in/outbound traffic, and with audio format, it is quite noticeable.

A question to the fellow developers and builders out there: how do you handle this?

Also, a question to the OpenAI team: are there any plans to extend the Session API with the endpoints to:

  • stop the session forcefully
  • get session status and metadata (e.g., costs & duration)
    ?

Thanks, and have fun.

3 Likes

It’s a great question and I will make sure it gets raised with OpenAI.

3 Likes

Thanks a lot!

Please, let me know once there is any info from OpenAI on that.

I am very keen for the same answer and facing into the same trouble.