Realtime API cost anomaly: disproportionate charges on audio input

Hello,

We are using the Realtime API (gpt-4o-realtime-preview-2024-12-17). When reviewing the usage dashboard for a 15-minute session, I noticed that the cost for audio input was $5.28, while the cost for audio output was $0.65.

This seems inconsistent with expected behavior. During the session, I used very short input sentences, while the model responded with longer outputs. According to the Realtime pricing model (Per 1M tokens), audio input is billed at $40, and audio output at $80. Based on that, the output cost should be higher than the input cost.

By this logic, the cost for audio input in this session should be lower than $0.65, not $5.28.

We use OpenAI dashboard, please see below all the data for the 15-minute session with associated costs.

realtime api | gpt-4o-realtime-preview-2024-12-17 audio, input
Cost: $5.28

realtime api | gpt-4o-realtime-preview-2024-12-17 audio, cached input
Cost: $0.57

realtime api | gpt-4o-realtime-preview-2024-12-17 audio, output
Cost: $0.65

realtime api | gpt-4o-realtime-preview-2024-12-17 text, input
Cost: $0.43

realtime api | gpt-4o-realtime-preview-2024-12-17 text, cached input
Cost: $0.43

realtime api | gpt-4o-realtime-preview-2024-12-17 text, output
Cost: $0.05

gpt-4o-transcribe audio, input
Cost: <$0.01

gpt-4o-transcribe text, input
Cost: <$0.01

gpt-4o-transcribe text, output
Cost: <$0.01

text-embedding-3-small
Cost: <$0.01

For reference, we are in a quiet environment and manually activate the microphone by holding down a button when speaking.

Has anyone experienced something similar and figured out what was going on?

Thanks