Difference token count between gpt-4 and gpt-3.5 (also gpt-4-1106-preview) , does anyone know the reason?

forestwanglin · January 22, 2024, 9:00am

I tried to test the below request to OpenAI api.
It returns tokens with 35.
When I changed the model to gpt-3.5-turbo or gpt-4-1106-preview, it returns 34.

Anyone know the calculation logic? What is the difference between gpt4 and gpt-3.5-turbo?

{
	"model": "gpt-4",
	"messages": [
		{
			"role": "assistant",
			"content": null,
			"tool_calls": [
				{
					"id": "call_Id8ycVMsW8gdsf7kSXfgAcf1",
					"type": "function",
					"function": {
						"name": "get_current_weather",
						"arguments": "{\n  \"location\": \"Boston, MA\"\n}"
					}
				}
			]
		},
		{
			"role": "tool",
			"tool_call_id": "call_Id8ycVMsW8gdsf7kSXfgAcf1",
			"name": "get_current_weather",
			"content": "29 degree celcius"
		}
	]
}

forestwanglin · January 30, 2024, 5:14am

After investigated this issue for few days, I found that they are the same token count when using gpt-4, gpt-3.5 and other popular models.

It is cased by the arguments in tool_calls.

When we send tools to OpenAI, we will get below message:

"tool_calls": [{
    "id": "call_h921N3VXwHw0RI7fgn6USsKS",
    "type": "function",
    "function": {
        "name": "get_weather",
        "arguments": "{\n\"location\":\"Shanghai\",\n\"unit\":\"celsius\"\n}"
    }
}

Sometime it returns the function like:

  "function": {
        "name": "get_weather",
        "arguments": "{\"location\":\"Shanghai\",    \"unit\":\"celsius\"}"
    }

The difference between them is obviously. The space, tab space or newline char in arguments.

After test, I get the conclusion. OpenAI calculates arguments’s token count without space, tab space but newline char. Take an example as below.

 "function": {
        "name": "get_weather",
        "arguments": "{\n\"location\":\"Shanghai\",\n\"unit\":\"celsius\"\n}"
    }

I’ve fixed in version 3.7.20240130 of openai-java

Please free figure out the wrong idea about me if you find.

Topic		Replies	Views
Assistant API + gpt4o + filesearch uses more tokens then gpt3.5 API assistants-api	1	236	July 5, 2024
Chat GPT4 1106 vs ChatGPT 4: Impressive drop in quality API gpt-4 , chatgpt	27	15560	February 14, 2024
What is the reason for adding total 7 tokens? API chatgpt , api	12	3913	December 11, 2023
Is JSON Mode supposed to result in a higher prompt token count? API	2	1425	December 1, 2023
Official tokenizer has huge count difference from OpenAI tokenizer API	12	4882	October 1, 2023

Difference token count between gpt-4 and gpt-3.5 (also gpt-4-1106-preview) , does anyone know the reason?

Related topics