Strange token cost calculation for tool_calls

You are looking at an assistant that is being told that it emitted first one, and then multiple tools calls, then.

So we need a little loopy to make building requests automated:

from openai import OpenAI
client = OpenAI()

for toolcount in range(1, 6):
    call = {"id": "x", "type": "function",
        "function": {"name": "calculator", "arguments": "2"}}
    assistant = {"role": "assistant", "tool_calls": []}
    tool = {"role": "tool", "tool_call_id": "x", "content": "3"}
    for _ in range(toolcount):
        assistant["tool_calls"].append(call)
    request = {"model": "gpt-3.5-turbo", "max_tokens": 1, "messages": []}
    request["messages"].append(assistant)
    request["messages"].append(tool)
    # print(request)

    a = client.chat.completions.with_raw_response.create(**request)
    chat_completion = a.parse()
    print(f" --{chat_completion.usage.model_dump()['prompt_tokens']}")

–18
–62
–80
–98
–116

What I see is a big jump when transitioning from one to two calls, and then a progressive rate (+18).

My guess is that this reflects the new language the AI also uses: that emitting multiple tool calls has a much larger container, and the AI makes a transition to another method to write them to the tool recipient API backend, and telling the AI what it called in the past takes as much overhead.