I just used the Realtime API for the first time and I’m shocked. I used it literally for just 75 seconds and was billed almost $6 for it.
Here is my actual usage:
11,182 Input tokens
1,680 Output tokens
Transcribed Seconds (Audio): 75
It is unclear whether the input and output tokens are text, audio, or both but I’ll actually add up all of that for you now, even though I would technically be double-billing myself.
So:
Input token price (Audio): 11,182 / 1 million * $100 = $1,11
Output token price (Audio): 1,680 / 1 million * $200 = $0.33
Audio transcription price ($0.006 per minute): 75 seconds is exactly 1.25 minutes, so:
1.25 * $0.006 = $0.08
If I add all these up, I get: $1.91
I was billed for exactly $5.28. That is more than 2.5x what the pricing page suggests. Can anyone confirm this observation? Can OpenAI have a look at their pricing model?
The server-side management of a chat session continues to grow and grow in length. Every response that you get, every time the AI detects your voice is over, is another “input tokens” fee calculated on that, with potential to grow up to 100000+ tokens of input per “are you still there?”.
We still should be expecting the usage page, and its “activity” tab showing reported token usage, to coincide directly with the token costs. However, the expense usage is less transparent than one would like, being aggregated over a day and combined with other models.
But if 75 seconds are billed at $5.28, assuming linear growth, which you explained is not the case, is $282 an hour. That can’t actually be intentional, can it? Even if the token numbers on the usage page are wrong, I spent a maximum of 4 minutes with the API. Even when I ignore the usage page and go with that, $5.28 for 4 minutes is $80 an hour, almost an order of magnitude higher than minimum wage. Something must be going wrong, no?
Sure, the system message was the regular one that is set when you open the playground. I had around 10 or so exchanges with the model, asking it for simple little stories for kids. I did interrupt it several times as part of these exchanges.
But even with all this info, shouldn’t my usage page reflect my actual usage? Why is it relevant how I used the model when my usage page is showing me an exact quantity of how many resources I used? If the numbers on my usage page aren’t to be trusted here, then what is that page for?
I can’t see what your usage page shows, so I can’t speak to that.
What I can tell you it’s that you’re charged for all tokens generated, whether you hear them or not.
So, if you ask the model for a long story, it might finish doing the computations after 10-seconds for audio which will take 3-minutes to play back. If you interrupt the model after 15-seconds, you still need to pay for 3-minutes worth of generation.
Now, your voice input is tokenized at 10 tokens/second of audio and is also tokenized as text at about 1.3x the token count of o200k_base. Output audio is tokenized at 20 tokens/second of audio and has the same text tokenizer at about 1.3 the normal rate. System messages are just tokenized as text.
So, when you’re doing a lot of interruptions, unless you’re actively truncating and culling all the audio tokens which you’re not really using, those will pile up quickly.
If you’re doing 10 rapid exchanges, whatever the output token count of your first response will be included 9-times as audio input tokens, the second will be included 8-times, etc. So, if you terminate a 2,000 audio token response early and don’t remove it, that will add up very quickly if you keep doing rapid-fire exchanges.
Well, I detailed what my usage panel shows in this post. It shows this:
And this:
And based on this usage, I calculated the price.
I understand that there may be mechanisms by which my usage may lead to more tokens being used than I would expect, but my issue here is that the page that shows how many tokens I used does not correspond to the amount of money I was charged. I would expect the “unexpected” token pileup you’re describing to still show up as usage in my usage panel.
One possibility is that the accounting is still trickling in. It has been observed that full usage has not shown up for even a day - going back and increasing the previous day. The billing, even at the API call level, is outsourced, and the calls the usage page itself makes are to that outsourced provider, Stripe, where quantities of tokens are added as line items to a bill. Don’t expect realtime, or any other model, to have realtime billing (5 minute increments of exact billing was shut off hours after Altman announced “Assistants” at devday 2023 with the same opaque billing problems.)
Secondly, those pages show per-project usage by default, of the selected project, and might not include user keys. You’ll have to ensure key management, or go to “entire organization” to find billing in some cases.
Then finally, a lesson that should have instilled long ago for users of any chat API - conversations grow, a context of what has been discussed before. This is fed back into the model for each new response to give an illusion of memory. The same is done with voice - the AI is constantly “listening” to all the tokens received before when it is crafting a new response. The realtime API has no management solution offered except to hang up the connection and start again.
I suppose I will check back tomorrow to see if the usage has changed at all. But even then it doesn’t make much sense to me that my one session with the API would be cut in half when it comes to reporting. The usage is already there and has been the same for an hour or so. But this may be the likeliest explanation.
I am aware it’s per project use. I have the only project selected that I have used the API with and I used the playground, so no need to check any keys. Org level usage shows the same figures.
I don’t need this lesson, this is something I understand. I have no problem with tokens piling up quickly. What I have a problem with is that the tokens I DID use don’t add up to the price I was charged. Again, this may be due to the delayed update you described, but we shall see. Will update the thread tomorrow.
RealTime API can get pricey. Most new toys are. You’re playing a very dangerous game by asking it to output stories. A model like GPT can go on for paragraphs.
You most likely ended up paying for a number of short kid stories and never heard them because you interrupted the model.
The metrics that you are showing can’t be considered reliable. The “transcribed” seconds may only involve a fraction of the audio as you interrupted the process (Whisper is used on the audio output for transcription)
Would you kindly read my responses here? The fact that tokens add up is not the issue. I’ve been using the assistants API since it was released, I understand how context works and that it grows. I understand that tokens add up.
Even if the transcribed seconds in reality were 5x as high, I was still overcharged.
Would you explain how I misunderstood what exactly? I looked at my usage page, which is supposed to reflect how many resources I used exactly, and the pricing page to calculate what I should have been charged and found that it is much lower than what I was actually charged.
Again, I do understand that tokens add up and when you interrupt it, you still pay for the generation that was in progress, even when the output is interrupted. None of that is relevant because I looked directly at the token usage on my usage page to calculate my price. If I “misunderstood the way tokens are charged,” this would imply that tokens are charged without showing up on the usage page, that the token overhead that people keep explaining to me is charged in secret.
Yeah. It’s pretty frustrating. I’ve also had some difficulties trying to match cost to the usage page. It should trickle in as @_j said. It may be worthwhile to implement your own cost tracker.
I will definitely keep an eye on it. Unfortunately, the data from the network tab doesn’t show anything that suggests more usage than is already displayed. If it trickles in tomorrow or the day after, I’ll update this thread for future reference. If it doesn’t add up even then, I suppose this is a case for support@openai.com.
It should also give you some slightly easier to parse billing information, and you can take the information in those two responses to generate a nice clean table showing your activity and billing which will be easier for people to understand and interpret.
Hopefully you can use the information here to figure out what happened. In my tests the tokens are traded as expected. It’s pricey, especially if interrupting, but it’s not overcharging in the sense of charging more than indicated.