Why does the use of tools in finetuning represent a 40% increase in the amount of trained tokens?

rotuladosjas01 · September 10, 2024, 12:54pm

Same data set but with the functions as tools:

Same data set but with the functions:

There are 67 training chats for both sets

_j · September 10, 2024, 1:11pm

I suspect that it is because tool fine-tune also is including the the multi_tool_use tool for parallel tool calls and its description. Also, functions might not support and ignore some of a schema.

I calculate a 911 token per example difference, and you can also divide that by the number of epochs trained to obtain the final extra token count per single run of example.

Here is the source of a lot of bloat documented:

Then you also have the tokens of the AI emitting what it does in a different manner if fine tuning backend were programmed to invoke that parallel container wrapper to even just send a single tool call. Giving and getting tool call IDs to match up (and enforce) input to output in the AI language.

So:
functions = less undesired behavior, less text you didn’t write and can’t improve.
tools = more nesting quality like descriptions in nested objects and more json schema parameters converted to description when placing the tool spec.

It is a shame OpenAI attempts to obfuscate the actual AI operation in terms of tokens actually employed.

Topic		Replies	Views
Token Optimization with fine tuning + Function Calling API fine-tuning , function-calling	2	336	January 10, 2025
Tools v. Functions Performance Difference API function-calling , tools	3	2552	July 17, 2024
Finetuning with tool calls and tool responses API fine-tuning , tools	3	619	July 23, 2025
How do you get token count with tools input and tool_calls output when streaming Feedback api	4	3944	November 21, 2023
GPT 5 100x token usage compared to GPT 4.1 API api , token , gpt-5	5	3473	August 18, 2025

Why does the use of tools in finetuning represent a 40% increase in the amount of trained tokens?

Related topics