Why does the use of tools in finetuning represent a 40% increase in the amount of trained tokens?

_j · September 10, 2024, 1:11pm

I suspect that it is because tool fine-tune also is including the the multi_tool_use tool for parallel tool calls and its description. Also, functions might not support and ignore some of a schema.

I calculate a 911 token per example difference, and you can also divide that by the number of epochs trained to obtain the final extra token count per single run of example.

Here is the source of a lot of bloat documented:

Then you also have the tokens of the AI emitting what it does in a different manner if fine tuning backend were programmed to invoke that parallel container wrapper to even just send a single tool call. Giving and getting tool call IDs to match up (and enforce) input to output in the AI language.

So:
functions = less undesired behavior, less text you didn’t write and can’t improve.
tools = more nesting quality like descriptions in nested objects and more json schema parameters converted to description when placing the tool spec.

It is a shame OpenAI attempts to obfuscate the actual AI operation in terms of tokens actually employed.

Topic		Replies	Views
Token Optimization with fine tuning + Function Calling API fine-tuning , function-calling	2	176	January 10, 2025
Tools v. Functions Performance Difference API function-calling , tools	3	2264	July 17, 2024
How do you get token count with tools input and tool_calls output when streaming Feedback api	4	3559	November 21, 2023
How many different function calls can a model learn how to do? API function-calling	10	1043	August 11, 2024
Function calling with fine tuned model API	18	4542	December 1, 2023

Why does the use of tools in finetuning represent a 40% increase in the amount of trained tokens?

Related topics