Realtime API: how to track consumption per user with a direct WebRTC communication?

ludovico · August 18, 2025, 10:58am

Hi,

I’m implementing an app using the realtime api. I want to allow users to have real-time voice conversations with the model, but I need to track how much they consume (ideally in minutes so that I can offer a clear pricing plan to the final user but, if that’s not possible, at least the consumption in tokens).

I have seen this implementation from Sergey Krivov:

https://github.com/skrivov/openai-voice-webrtc-next

As far as I have seen, that demo follows these steps:

From the browser, we send a request to the backend asking to start a session.
From the backned, we send a request to OpenAI to generate an ephemeral token.
That ephemeral token is sent to the client (so that the api key is not exposed to the client)
Then, the client can use the ephemeral token to stablish a direct connection with OpenAI. From there (as far as I’ve seen, the communication is direct between the browser and the servers from OpenAI)

In order to track consumption, the only options that come through my mind are the following:

A) I can send an http request from the client to my server, to notify about the duration of each session. Problem: this doesn’t seem like a very safe approach (a malicious user can fake those notifications and exploit the system to have much longer/expensive conversations)
B) I could implement some kind of proxy, so that all the communication goes through my backend (ie. the browser would not connect to openAI directly, all the conversations will go through my backend server). Problems: apart from the extra setup (I’d need to implement that logic for the backend), I am concerned that this configuration could add a significant amount of latency.

So, my main question is:

Is there a reliable way to track how long each session lasts (or how many tokens it consumes), while allowing a direct WebRTC communication between browser and OpenAI?

For example, I wonder if openAI has any way to check the consumption for specific sessions or ephemeral tokens. Or maybe any way to invoke tools in the server (not on the client), receive notifications in the server (like a webhook or similar), or check periodically if a connection is still active?

If that’s not possible, would any service like LiveKit/Agora/Stream allow to solve that problem while keeping a low latency? (I don’t have experience with any of them)

Topic		Replies	Views
Realtime api - get conversation transcription from backend while using webrtc API realtime	2	499	February 3, 2025
Realtime API WebRTC best security & use practices API realtime	10	1248	September 30, 2025
Realtime API cost tracking per session API realtime	1	501	November 11, 2024
RealtimeAPI: WebRTC (Client) + WebSocket (Server) possible? API realtime	12	1036	February 23, 2025
No way to track token usage in Realtime Transcriptions API Feedback	0	218	May 11, 2025

Realtime API: how to track consumption per user with a direct WebRTC communication?

Related topics