Realtime API Pricing: VAD and Token Accumulation - A KILLER

zia.khan · October 22, 2024, 7:14pm

First, I think the other voices are of better quality. Secondly, I am not quite convinced that removing the previous chat history should affect the voice quality in any way. Here’s why I think this approach of cost-cutting should not affect voice.

Voice messages that we delete, do not carry metadata about emotions and tone etc.
As far as the context of the conversations is concerned, we are still pretty much providing the whole context just in a concise way.
Each generation of voice by the AI should largely depend on the last 1 or 2 turns it took in terms of tone continuation. We already kind of keep those as a buffer all the time.

Here’s what I think could be the problem.

We are feeding the summarized context via a system message. Ideally, this should have been via conversation items. But I noted earlier, that the API currently does not support that and it stops producing audio.

Another way of feeding the summarized context if it must be via conversation item is to always keep at least the last 3 audio responses by the model. This has two problems, one, of course, a higher cost, and two, this literally sounds like a hack and is only based on experimental observation. There’s no guarantee this will work deterministically in production.

liquidshadowsmk · October 23, 2024, 4:13am

While testing the strategy/hypothesis presented by @zia.khan I am reminded of this comment from @stevenic

@zia.khan how did you arrive at the conclusion that

What @stevenic says, lines up with the rationale for carrying forward audio tokens. What @zia.khan seems to have observed (yet to be qualified) causes one to question the purpose of carrying forward those tokens that are ultimately responsible for cost inflation.

Also revisiting @jeffsharris note earlier:

@jeffsharris any chance you could clarify this for us?

jyothikumar_007 · September 4, 2025, 7:16am

Hey liquidshadowsmk, so I have few queries and would really love your input on this -

so, I’ve started working with openai realtime api with webrtc recently, and a bit confused with cost calculation**…. if they are accumulating and giving it, should we just capture the last response.done event and then calculate it or we have to do some additional math??
In every response.done, though I’m sending transcription as null, not using any whisper model for transcription, I’m still getting the model output as string, and I’m still charged with the text tokens cost. Is there anyway I can just ignore hte text part??

```
{ “type”: “response.done”, “event_id”: “random_id”, “response”: { “object”: “realtime.response”, “id”: “random_id”, “status”: “completed”, “status_details”: null, “output”: [ { “id”: “random_id”, “type”: “message”, “status”: “completed”, “role”: “assistant”, “content”: [ { “type”: “output_audio”, “transcript”: “Hmm, okay, so you saying you got research or something to help me expand in Southeast Asia, huh? Alright, give me details. What exactly you offering, and how it not gonna waste my time?” } ] } ], “conversation_id”: “conv_CBdNYImJOGkQB98lOVKCk”, “output_modalities”: [ “audio” ], “max_output_tokens”: “inf”, “audio”: { “output”: { “format”: { “type”: “audio/pcm”, “rate”: 24000 }, “voice”: “marin” } }, “usage”: { “total_tokens”: 3209, “input_tokens”: 2865, “output_tokens”: 344, “input_token_details”: { “text_tokens”: 1701, “audio_tokens”: 1164, “image_tokens”: 0, “cached_tokens”: 2624, “cached_tokens_details”: { “text_tokens”: 1664, “audio_tokens”: 960, “image_tokens”: 0 } }, “output_token_details”: { “text_tokens”: 65, “audio_tokens”: 279 } }, “metadata”: null } }```

would really love your insights on this

Topic		Replies	Views
New Realtime API voices and cache pricing Announcements realtime , prompt-caching	26	13342	September 2, 2025
Realtime API extremely expensive Feedback realtime	66	9046	December 4, 2024
What does "auto" truncation in realtime api actually do? Documentation gpt-realtime	6	1036	November 28, 2025
Realtime API pricing questions: text input and audio tokens API realtime	7	874	December 6, 2025
Realtime API pricing is wrong, will overcharge API realtime	36	5226	January 15, 2025

Realtime API Pricing: VAD and Token Accumulation - A KILLER

Related topics