Inconsistent token billing for tool_calls in gpt-3.5-turbo-1106

ai4j · December 7, 2023, 9:10am

Hi, I am sending the following request:

{
  "model": "gpt-3.5-turbo-1106",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is the time now?"
        }
      ]
    }
  ],
  "temperature": 0.0,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "check_watch",
        "description": "returns current time",
        "parameters": {
          "type": "object",
          "properties": {},
          "required": []
        }
      }
    }
  ]
}

And getting two different completion_tokens calculations for the same responses. Sometimes it is 10, sometimes 27:

{
    "id": "chatcmpl-8T4fbHQI5v8fhQEgm4uY6UDzzrFgg",
    "object": "chat.completion",
    "created": 1701940155,
    "model": "gpt-3.5-turbo-1106",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": null,
                "tool_calls": [
                    {
                        "id": "call_2b39dQtuHVAX17Yx610fFpSe",
                        "type": "function",
                        "function": {
                            "name": "check_watch",
                            "arguments": "{}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": 42,
        "completion_tokens": 10,
        "total_tokens": 52
    },
    "system_fingerprint": "fp_eeff13170a"
}

{
    "id": "chatcmpl-8T4fssaSPN74eSQQO7gZ47gPw0XvF",
    "object": "chat.completion",
    "created": 1701940172,
    "model": "gpt-3.5-turbo-1106",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": null,
                "tool_calls": [
                    {
                        "id": "call_dQQJHC4zaBj8DwHUtEwtNHzs",
                        "type": "function",
                        "function": {
                            "name": "check_watch",
                            "arguments": "{}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": 42,
        "completion_tokens": 27,
        "total_tokens": 69
    },
    "system_fingerprint": "fp_eeff13170a"
}

ai4j · December 7, 2023, 9:21am

One more example. But here the outputs are slightly different, notice a space in arguments between : and \:

{
  "model": "gpt-3.5-turbo-1106",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is the time in Berlin now?"
        }
      ]
    }
  ],
  "temperature": 0.7,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "watch",
        "description": "returns current time",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {}
          },
          "required": [
            "city"
          ]
        }
      }
    }
  ]
}

{
    "id": "chatcmpl-8T4oi1fsSAYlp545AS6L12tPmhF6I",
    "object": "chat.completion",
    "created": 1701940720,
    "model": "gpt-3.5-turbo-1106",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": null,
                "tool_calls": [
                    {
                        "id": "call_wMg5Atsz2N1SgrcuLIsT3AXZ",
                        "type": "function",
                        "function": {
                            "name": "watch",
                            "arguments": "{\"city\": \"Berlin\"}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": 49,
        "completion_tokens": 28,
        "total_tokens": 77
    },
    "system_fingerprint": "fp_eeff13170a"
}

{
    "id": "chatcmpl-8T4otLkkGX6AToaHmnGJevuFl6x24",
    "object": "chat.completion",
    "created": 1701940731,
    "model": "gpt-3.5-turbo-1106",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": null,
                "tool_calls": [
                    {
                        "id": "call_3FgiPGqnONsmEyXSn5DH7MPr",
                        "type": "function",
                        "function": {
                            "name": "watch",
                            "arguments": "{\"city\":\"Berlin\"}"
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": 49,
        "completion_tokens": 13,
        "total_tokens": 62
    },
    "system_fingerprint": "fp_eeff13170a"
}

Topic		Replies	Views
Discrepancy in Token Counts Between tiktoken and API Usage for o4-mini/gpt-4o-mini Bugs api	1	159	May 28, 2025
Strange token cost calculation for tool_calls API api	10	2696	February 3, 2024
Parallel tool calls in chat completions causes token count overestimation from the API Bugs function-calling , tools	1	304	December 23, 2024
[URGENT] GPT-4.1-mini consumes unknown amount of input tokens Bugs gpt-4 , api	8	513	May 1, 2025
Prompt tokens usage seems too high API api	1	2630	January 21, 2024

Inconsistent token billing for tool_calls in gpt-3.5-turbo-1106

Related topics