Confusion Regarding Tokenization Calculation in Realtime API and Potential Double Charging Concerns

Many people complain about expensive charge on realtime API, I also confused about tokenization calculation in API.

I got the first response message from realtime API: “Hi there! How can I assist you today?”

It says 20 output tokens.

      "output_token_details": {
        "text_tokens": 20,
        "audio_tokens": 47
      }

but it is 10 by this tool https://platform.openai.com/tokenizer.

They doubled it, and double charge us?

Attached is an event of “response.done” from realtime API

{
  "type": "response.done",
  "event_id": "****",
  "response": {
    "object": "realtime.response",
    "id": "*****",
    "status": "completed",
    "status_details": null,
    "output": [
      {
        "id": "*****",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
          {
            "type": "audio",
            "transcript": "Hi there! How can I assist you today?"
          }
        ]
      }
    ],
    "usage": {
      "total_tokens": 577,
      "input_tokens": 510,
      "output_tokens": 67,
      "input_token_details": {
        "cached_tokens": 0,
        "text_tokens": 510,
        "audio_tokens": 0
      },
      "output_token_details": {
        "text_tokens": 20,
        "audio_tokens": 47
      }
    }
  }
}

There are two things at play here,

  1. They appear to possibly be using a slightly different tokenizer than o200k_base. This tokenizer requires about 30% more tokens for the same amount of text.
  2. You need to include the control tokens which delineate which messages come from which entity (system, user, assistant). These are only on the order of about 4 or so tokens for each message, but on small messages they represent a sizable proportion of the total tokens.

In short, no, you aren’t being double billed.

Would love more info if you’re continuing to see problems. One thing that’s helpful to check is using the Playground, you can see the total tokens of each type consumed during your entire session at the top of the Logs