Hello,
We are using the Realtime API (gpt-4o-realtime-preview-2024-12-17). When reviewing the usage dashboard for a 15-minute session, I noticed that the cost for audio input was $5.28, while the cost for audio output was $0.65.
This seems inconsistent with expected behavior. During the session, I used very short input sentences, while the model responded with longer outputs. According to the Realtime pricing model (Per 1M tokens), audio input is billed at $40, and audio output at $80. Based on that, the output cost should be higher than the input cost.
By this logic, the cost for audio input in this session should be lower than $0.65, not $5.28.
We use OpenAI dashboard, please see below all the data for the 15-minute session with associated costs.
realtime api | gpt-4o-realtime-preview-2024-12-17 audio, input
Cost: $5.28
realtime api | gpt-4o-realtime-preview-2024-12-17 audio, cached input
Cost: $0.57
realtime api | gpt-4o-realtime-preview-2024-12-17 audio, output
Cost: $0.65
realtime api | gpt-4o-realtime-preview-2024-12-17 text, input
Cost: $0.43
realtime api | gpt-4o-realtime-preview-2024-12-17 text, cached input
Cost: $0.43
realtime api | gpt-4o-realtime-preview-2024-12-17 text, output
Cost: $0.05
gpt-4o-transcribe audio, input
Cost: <$0.01
gpt-4o-transcribe text, input
Cost: <$0.01
gpt-4o-transcribe text, output
Cost: <$0.01
text-embedding-3-small
Cost: <$0.01
For reference, we are in a quiet environment and manually activate the microphone by holding down a button when speaking.
Has anyone experienced something similar and figured out what was going on?
Thanks