Realtime API pricing is wrong, will overcharge

I just used the Realtime API for the first time and I’m shocked. I used it literally for just 75 seconds and was billed almost $6 for it.

Here is my actual usage:
11,182 Input tokens
1,680 Output tokens

Transcribed Seconds (Audio): 75

It is unclear whether the input and output tokens are text, audio, or both but I’ll actually add up all of that for you now, even though I would technically be double-billing myself.

So:
Input token price (Audio): 11,182 / 1 million * $100 = $1,11
Output token price (Audio): 1,680 / 1 million * $200 = $0.33

Input token price (Text): 11,182 / 1 million * $5 = $0.06
Output token price (Text): 1,680 / 1 million * $20 =$0.33

Audio transcription price ($0.006 per minute): 75 seconds is exactly 1.25 minutes, so:
1.25 * $0.006 = $0.08

If I add all these up, I get: $1.91

I was billed for exactly $5.28. That is more than 2.5x what the pricing page suggests. Can anyone confirm this observation? Can OpenAI have a look at their pricing model?

2 Likes

The server-side management of a chat session continues to grow and grow in length. Every response that you get, every time the AI detects your voice is over, is another “input tokens” fee calculated on that, with potential to grow up to 100000+ tokens of input per “are you still there?”.

We still should be expecting the usage page, and its “activity” tab showing reported token usage, to coincide directly with the token costs. However, the expense usage is less transparent than one would like, being aggregated over a day and combined with other models.

1 Like

In order to have any idea what is going on with your request we’d need a great deal more information than you’ve provided.

  • How many exchanges happened during this 75 seconds?
  • Did you interrupt the model several times?
  • What type of responses were you asking for?
  • How long was your system message?

But if 75 seconds are billed at $5.28, assuming linear growth, which you explained is not the case, is $282 an hour. That can’t actually be intentional, can it? Even if the token numbers on the usage page are wrong, I spent a maximum of 4 minutes with the API. Even when I ignore the usage page and go with that, $5.28 for 4 minutes is $80 an hour, almost an order of magnitude higher than minimum wage. Something must be going wrong, no?

1 Like

Sure, the system message was the regular one that is set when you open the playground. I had around 10 or so exchanges with the model, asking it for simple little stories for kids. I did interrupt it several times as part of these exchanges.

But even with all this info, shouldn’t my usage page reflect my actual usage? Why is it relevant how I used the model when my usage page is showing me an exact quantity of how many resources I used? If the numbers on my usage page aren’t to be trusted here, then what is that page for?

I can’t see what your usage page shows, so I can’t speak to that.

What I can tell you it’s that you’re charged for all tokens generated, whether you hear them or not.

So, if you ask the model for a long story, it might finish doing the computations after 10-seconds for audio which will take 3-minutes to play back. If you interrupt the model after 15-seconds, you still need to pay for 3-minutes worth of generation.

Now, your voice input is tokenized at 10 tokens/second of audio and is also tokenized as text at about 1.3x the token count of o200k_base. Output audio is tokenized at 20 tokens/second of audio and has the same text tokenizer at about 1.3 the normal rate. System messages are just tokenized as text.

So, when you’re doing a lot of interruptions, unless you’re actively truncating and culling all the audio tokens which you’re not really using, those will pile up quickly.

If you’re doing 10 rapid exchanges, whatever the output token count of your first response will be included 9-times as audio input tokens, the second will be included 8-times, etc. So, if you terminate a 2,000 audio token response early and don’t remove it, that will add up very quickly if you keep doing rapid-fire exchanges.

2 Likes

Well, I detailed what my usage panel shows in this post. It shows this:
image
And this:
image
And based on this usage, I calculated the price.

I understand that there may be mechanisms by which my usage may lead to more tokens being used than I would expect, but my issue here is that the page that shows how many tokens I used does not correspond to the amount of money I was charged. I would expect the “unexpected” token pileup you’re describing to still show up as usage in my usage panel.

1 Like

It really should, shouldn’t it?

One possibility is that the accounting is still trickling in. It has been observed that full usage has not shown up for even a day - going back and increasing the previous day. The billing, even at the API call level, is outsourced, and the calls the usage page itself makes are to that outsourced provider, Stripe, where quantities of tokens are added as line items to a bill. Don’t expect realtime, or any other model, to have realtime billing (5 minute increments of exact billing was shut off hours after Altman announced “Assistants” at devday 2023 with the same opaque billing problems.)

Secondly, those pages show per-project usage by default, of the selected project, and might not include user keys. You’ll have to ensure key management, or go to “entire organization” to find billing in some cases.

Then finally, a lesson that should have instilled long ago for users of any chat API - conversations grow, a context of what has been discussed before. This is fed back into the model for each new response to give an illusion of memory. The same is done with voice - the AI is constantly “listening” to all the tokens received before when it is crafting a new response. The realtime API has no management solution offered except to hang up the connection and start again.

2 Likes

I suppose I will check back tomorrow to see if the usage has changed at all. But even then it doesn’t make much sense to me that my one session with the API would be cut in half when it comes to reporting. The usage is already there and has been the same for an hour or so. But this may be the likeliest explanation.

I am aware it’s per project use. I have the only project selected that I have used the API with and I used the playground, so no need to check any keys. Org level usage shows the same figures.

I don’t need this lesson, this is something I understand. I have no problem with tokens piling up quickly. What I have a problem with is that the tokens I DID use don’t add up to the price I was charged. Again, this may be due to the delayed update you described, but we shall see. Will update the thread tomorrow.

This would be your culprit.

For this reason.

RealTime API can get pricey. Most new toys are. You’re playing a very dangerous game by asking it to output stories. A model like GPT can go on for paragraphs.

You most likely ended up paying for a number of short kid stories and never heard them because you interrupted the model.

The metrics that you are showing can’t be considered reliable. The “transcribed” seconds may only involve a fraction of the audio as you interrupted the process (Whisper is used on the audio output for transcription)

Would you kindly read my responses here? The fact that tokens add up is not the issue. I’ve been using the assistants API since it was released, I understand how context works and that it grows. I understand that tokens add up.

Even if the transcribed seconds in reality were 5x as high, I was still overcharged.

1 Like

You misunderstood the way tokens are charged and paid the (still pretty inexpensive) price.

The transcribed seconds are not a reliable metric for conversation time.

Here’s a tip which will help you debug you billing a bit.

  1. Go to your organizational overview page in a chromium browser,

https://platform.openai.com/organization/usage

  1. Open the DevTools console
  2. Refresh the page
  3. Open the Network tab in the DevTools console
  4. In the left pane, find the calls going to activity and usage
  5. Right-clicking on these will allow you to download the JSON coming from the backend API

These JSON responses will contain a bit more information about your usage and activity than you can see directly in the dashboard.

Hopefully that’s enough for you to be able to audit your usage and billing.

I, personally, haven’t ever seen the billing not add up.

2 Likes

Would you explain how I misunderstood what exactly? I looked at my usage page, which is supposed to reflect how many resources I used exactly, and the pricing page to calculate what I should have been charged and found that it is much lower than what I was actually charged.

Again, I do understand that tokens add up and when you interrupt it, you still pay for the generation that was in progress, even when the output is interrupted. None of that is relevant because I looked directly at the token usage on my usage page to calculate my price. If I “misunderstood the way tokens are charged,” this would imply that tokens are charged without showing up on the usage page, that the token overhead that people keep explaining to me is charged in secret.

Yeah. It’s pretty frustrating. I’ve also had some difficulties trying to match cost to the usage page. It should trickle in as @_j said. It may be worthwhile to implement your own cost tracker.

1 Like

I will definitely keep an eye on it. Unfortunately, the data from the network tab doesn’t show anything that suggests more usage than is already displayed. If it trickles in tomorrow or the day after, I’ll update this thread for future reference. If it doesn’t add up even then, I suppose this is a case for support@openai.com.

It should also give you some slightly easier to parse billing information, and you can take the information in those two responses to generate a nice clean table showing your activity and billing which will be easier for people to understand and interpret.

1 Like

I think this is my quote?

Hopefully you can use the information here to figure out what happened. In my tests the tokens are traded as expected. It’s pricey, especially if interrupting, but it’s not overcharging in the sense of charging more than indicated.

1 Like

I also found it very expensive to spend 30 dollars for an one hour’s use.

1 Like

I’d love your org info if you’re seeing problems here (feel free to DM me)

One debug step that’s helpful to check is using the Playground, you can see the total tokens of each type consumed during your entire session at the top of the Logs

The way cost accumulate as sessions go on is a top priority for us to address at the moment. We’re underway adding caching so that we can heavily discount audio + text that’s been encountered already

6 Likes