Realtime API cost anomaly: disproportionate charges on audio input

arthurclmt · June 11, 2025, 9:50am

Hello,

We are using the Realtime API (gpt-4o-realtime-preview-2024-12-17). When reviewing the usage dashboard for a 15-minute session, I noticed that the cost for audio input was $5.28, while the cost for audio output was $0.65.

This seems inconsistent with expected behavior. During the session, I used very short input sentences, while the model responded with longer outputs. According to the Realtime pricing model (Per 1M tokens), audio input is billed at $40, and audio output at $80. Based on that, the output cost should be higher than the input cost.

By this logic, the cost for audio input in this session should be lower than $0.65, not $5.28.

We use OpenAI dashboard, please see below all the data for the 15-minute session with associated costs.

realtime api | gpt-4o-realtime-preview-2024-12-17 audio, input
Cost: $5.28

realtime api | gpt-4o-realtime-preview-2024-12-17 audio, cached input
Cost: $0.57

realtime api | gpt-4o-realtime-preview-2024-12-17 audio, output
Cost: $0.65

realtime api | gpt-4o-realtime-preview-2024-12-17 text, input
Cost: $0.43

realtime api | gpt-4o-realtime-preview-2024-12-17 text, cached input
Cost: $0.43

realtime api | gpt-4o-realtime-preview-2024-12-17 text, output
Cost: $0.05

gpt-4o-transcribe audio, input
Cost: <$0.01

gpt-4o-transcribe text, input
Cost: <$0.01

gpt-4o-transcribe text, output
Cost: <$0.01

text-embedding-3-small
Cost: <$0.01

For reference, we are in a quiet environment and manually activate the microphone by holding down a button when speaking.

Has anyone experienced something similar and figured out what was going on?

Thanks

arthurclmt · June 18, 2025, 2:08pm

[Update]
@OpenAI_Support @gokulraya @jeffsharris

We changed the Realtime version from ‘gpt-4o-realtime-preview-2024-12-17’ to ‘gpt-4o-realtime-preview-2025-06-03’

For a 15-minute session, the cost for audio input is now $0.33, the cost for audio cached input is $0.80, and the cost for audio output is $0.80

It still doesn’t match what is announced in the Realtime pricing.

First, I don’t know why the usage of audio cached input is higher than the audio input.

Then, regarding OpenAI publication:
Audio input is priced at $100 per 1M tokens […] This equates to approximately $0.06 per minute of audio input.
https://openai.com/index/introducing-the-realtime-api/

Currently the audio input pricing is $40 per 1M tokens, approximately $0.024 per minute of audio input.

For audio cached input, pricing is currently $2.50 per 1M tokens, approximately $0.0015 per minute of audio cached input.

The audio input cost is $0.33, which corresponds to approximately 13.8 minutes of speech. However, this is not realistic, as I did not speak for 13.8 minutes during the 15-minute session.

The audio cached input cost is $0.80, which translates to approximately 533 minutes of audio, this is clearly not possible given the session duration.

The audio output cost of $0.80 for the 15-minute session appears consistent.

Could someone help with this?

Thanks.

arthurclmt · June 20, 2025, 11:45am

Hello,

In addition, we are observing unexpectedly high costs related to text input and text cached input in the Realtime API.

Here is a breakdown of our usage:

The instruction prompt for Realtime: 262 tokens
Function calling definition: 996 tokens
Text input exchanged during the 15-minute session: up to 5,000 tokens
Spoken input: up to 250 tokens

Maximum estimated usage: ~6,500 tokens

However, the usage reported on the dashboard is significantly higher:

Text input: $0.05 → approximately 10,000 tokens

Text cached input: $0.54 → approximately 216,000 tokens

This results in a total of 226,000 tokens, which is far beyond our expected maximum of 6,500

Could you please help us understand where this additional usage might be coming from, and whether this could be an error in token accounting?
@OpenAI_Support

Thank you in advance.

arthurclmt · June 24, 2025, 1:14pm

Would it be possible to get support from OpenAI on this?

Thanks!

arthurclmt · July 3, 2025, 8:57am

Hello,

Anyone from OpenAI to help?

This is key topic as it is related to usage / billing.

Thanks

_j · July 4, 2025, 11:38am

I think this is just a misunderstanding of how the technology works.

Despite being “realtime”, the generation of a response to you is turn-based.

A server side message history is maintained and appended to with every new generation, whether triggered by sending an API event after sending to the buffer, or triggered by the end of voice activity detection. You are not given a cost-management mechanism, it just grows and grows.

The cached figure means you are being re-billed for what input was seen before in previous response generation. The AI model has to understand and be passed again all conversation to respond appropriately to the latest turn.

Example input to the model being billed:

Turn 1:

user: “Please permanently talk like an Australian”

Turn 2:

user: “Please permanently talk like an Australian” (cached)
ai: “G’day mate, dinkum wallaby on the barbie. Let’s crack on!”
user: “Not a stereotype, an accurate accent.”

arthurclmt · July 9, 2025, 8:11am

Hello @_j,
Thank you for your response.

Even considering your explanation, the audio input cost of $0.33 seems high compared to the actual spoken content during the session. The same applies to the cached audio input cost of $0.80.

Additionally, the costs for text input and cached text input is also higher than expected.

We believe there may be an issue with the current calculation of Realtime API usage, especially when compared to the information published by OpenAI in their official announcement: https://openai.com/index/introducing-the-realtime-api

Topic		Replies	Views
Help me understand the true cost of the RealTime API API api , realtime	2	1643	March 26, 2025
Realtime API pricing questions: text input and audio tokens API realtime	7	826	December 6, 2025
Confusion Between Per-Minute Audio Pricing vs. Token-Based Audio Pricing API realtime	2	9572	December 28, 2024
Realtime API pricing is wrong, will overcharge API realtime	36	5176	January 15, 2025
Realtime API cost mismatch between the bill and the calculated cost API realtime	1	385	May 14, 2025

Realtime API cost anomaly: disproportionate charges on audio input

Example input to the model being billed:

Turn 1:

Turn 2:

Related topics