Estimate the cost for 1 min usage of Real-time API

nandhu.nr07 · November 15, 2024, 1:13pm

Can someone help in calculating the cost for the usage Real-time API for 1 min. Can be a rough estimate. But can someone help me in estimating that.? It can be normal conversation

cortigeronimo · November 15, 2024, 5:06pm

On average it is 6 cents for audio input and 24 for audio output (per minute).
https://openai.com/api/pricing/

_j · November 15, 2024, 5:37pm

That does not allow for the fact that each triggering of a response adds audio to a chat history, growing the audio input every time the AI responds, which can be multiple times within a minute. You are paying over and over for things said and spoken before.

ivan-luchkin-u · November 15, 2024, 6:02pm

There are multiple factors to this.

The price per minute will always depend on the number of turns happened in the conversation. With number of turns increasing, the price will also increase because the conversation history gets bigger and the model keeps consuming it all with each new turn.
Prompt caching should be included in the equation as well, because all or most tokens from previous turns will hit cache (both audio and text, both input and output)
The context window of the model is 128k tokens, but the actual output window is only 4096. The average amount of input tokens, the average amount of output tokens (which are priced differently), and overall ratio between them is a major factor
The usage of function calls should also be considered, because functions can be rather large depending on the use case, and if the model will call them multiple times, or call multiple functions per turn (which could not be supported at this time, don’t quote me on that), the token usage will increase significantly.
Tokenization of texts (and maybe audio, not known at this time) has different efficiency depending on the language of the input. For tokenizers used in OpenAI models, English language is the most efficient (i.e. the least tokens per amount of text). This means that if text (and maybe audio) is very inefficiently tokenized, the amount of tokens – and the price of turn/session – will increase.
Model behavior itself can produce extra turns (verifying user input, making sure that assistant heard the user correctly etc.) which will increase usage.
Model can “glitch”, hallucinate, call functions erroneously, produce enormous amount of output, users can try to do prompt injection, force the model out of bounds etc. which is also a major cost factor.

getinference · November 20, 2024, 5:49pm

My experience after hours of conversation so far is about 1$ per minute

markojak · January 9, 2025, 4:24pm

That’s not very practical for most implementations. Would need to target a use case where the cost of having a human speak is much higher but then you have a mismatch between existing capabilities and price

Topic		Replies	Views
New Realtime API voices and cache pricing Announcements realtime , prompt-caching	26	4543	November 27, 2024
WebRTC gpt-4o-audio cost per minute of conversation? API gpt-4o-audio-preview	0	138	December 26, 2024
Confusion Between Per-Minute Audio Pricing vs. Token-Based Audio Pricing API realtime	3	482	December 30, 2024
Help me understand the true cost of the RealTime API API api , realtime	1	434	January 27, 2025
Help Needed: Affordable OpenAI Real-Time API Integration API gpt-4 , realtime	1	155	November 6, 2024

Estimate the cost for 1 min usage of Real-time API

Related topics