Hi,
Realtime API (as any OpenAI API) was/is always priced based on tokens used. The only difference to the “old” pricing is that openAI mentioned an estimate what the token pricing would come down to per minute. But the charge was always by tokens.
However that estimate was very inaccurate, thats why I think they just removed with with the new pricing.
So, lets break some of your other questions down:
-
A one hour conversation is not possible, at the moment I think the limit is 30 minutes (or 15) its somewhere in the docs.
-
From my testing of a “natural” phone conversation over 2 minutes, the cost is about 0.09/USD for the 4o-mini and around 0.21-0.25/USD for the 4o realtime model.
BUT don’t forget that with every input, the whole conversation is sent to the model, meaning the cost increase exponentially. Most of it should be covered by cache hits, but still.
Another cost factor is, interrupting the model. All response tokens are generated while you start listening to the response. So if a large chunk for a response was generated, but you cut it off at the first word - you still pay for those unused but generated output tokens.
The best way to look at a specific use case is to go to the playground, simulate a few conversations and then look at usage/billing to see how much tokens are used, how much cached, and what the conversations are costing you. Look at the logs for the session ID and compare with the detailed usage export.
Quick Example:
User: This is a session.
Assistant: Hi there! What’s on your mind today?
Was:
Audio Token Input: 12
Text Token Input: 759
Audio Token Out: 41
Text Token Out: 19
Hope that helps