Lets break down the input/output token details together!

_j · October 6, 2024, 2:00am

The answer is as simple as this -

Handling long conversations
If a conversation goes on for a sufficiently long time, the input tokens the conversation represents may exceed the model’s input context limit (e.g. 128k tokens for GPT-4o). At this point, the Realtime API automatically truncates the conversation based on a heuristic-based algorithm that preserves the most important parts of the context (system instructions, most recent messages, and so on.) This allows the conversation to continue uninterrupted.

Saying “hi again” to a long voice token session? Or even a blip of background noise…
$10.00 / 100k input tokens

“.truncate” allows you to trim an audio part of a most recent response, when you refer to the correct modality chunk part, and only want to affect a portion of the audio. A particular use - you want the AI to hear itself being cut off?

“.delete” can allow you to remove one turn, when referred specifically to ID by your client-side recording of it. You are maintaining and synchronizing your own chat history regardless of a stateful API being offered, right? (there is no list method offered if you got a server error instead)

Setting a maximum token threshold, or converting a voice audio input or modality into solely text you also purchased while retaining any audio whatsoever: not permitted.

Topic		Replies	Views
Realtime API pricing questions: text input and audio tokens API realtime	7	873	December 6, 2025
Why does each new request in Realtime API get more expensive? Are tokens accumulating? API realtime , api-realtime	1	312	September 5, 2025
Tokens usage on Response API with previous message API gpt-4 , responses-api	2	823	July 28, 2025
What does "auto" truncation in realtime api actually do? Documentation gpt-realtime	6	1036	November 28, 2025
Why are my context tokens used so quickly? API api	3	3140	January 5, 2024

Lets break down the input/output token details together!

Related topics