Strange token cost calculation for tool_calls

_j · December 2, 2023, 2:41pm

You are looking at an assistant that is being told that it emitted first one, and then multiple tools calls, then.

So we need a little loopy to make building requests automated:

from openai import OpenAI
client = OpenAI()

for toolcount in range(1, 6):
    call = {"id": "x", "type": "function",
        "function": {"name": "calculator", "arguments": "2"}}
    assistant = {"role": "assistant", "tool_calls": []}
    tool = {"role": "tool", "tool_call_id": "x", "content": "3"}
    for _ in range(toolcount):
        assistant["tool_calls"].append(call)
    request = {"model": "gpt-3.5-turbo", "max_tokens": 1, "messages": []}
    request["messages"].append(assistant)
    request["messages"].append(tool)
    # print(request)

    a = client.chat.completions.with_raw_response.create(**request)
    chat_completion = a.parse()
    print(f" --{chat_completion.usage.model_dump()['prompt_tokens']}")

–18
–62
–80
–98
–116

What I see is a big jump when transitioning from one to two calls, and then a progressive rate (+18).

My guess is that this reflects the new language the AI also uses: that emitting multiple tool calls has a much larger container, and the AI makes a transition to another method to write them to the tool recipient API backend, and telling the AI what it called in the past takes as much overhead.

Topic		Replies	Views
Parallel tool calls in chat completions causes token count overestimation from the API Bugs function-calling , tools	1	302	December 23, 2024
Tool_choice = auto sending content and tool_calls API gpt-4-turbo	16	1959	April 3, 2025
Inconsistent token billing for tool_calls in gpt-3.5-turbo-1106 Bugs	1	445	December 7, 2023
Parallel Tool-use Documentation for API models? API gpt-4 , api , o3	2	184	July 1, 2025
How to count tokens from code interpreter usage? API	3	2889	January 16, 2024

Strange token cost calculation for tool_calls

Related topics