Hello,
we decided to finetune Gpt4.1 or Gpt4.1-mini for both improving tool use ability and the final response generation.
In our use case, we let the model access several tools (around 20).
We opted for using large, expensive, reasoner models (e.g. o3 and o4) to pre-determine which tools are the best ones to call and in which order (for a given user query).
In the finetuning docs is reported that we need to provide samples for training that follows the chat-completition format.
So, ideally, we should come with a set of chats as the following example:
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "<User query>"
},
{
"role": "assistant",
"tool_calls": [
{
"id": "call_id_1",
"type": "function",
"function": {
"name": "tool_name_1",
"arguments": {"arg1": "value1", "arg2": "value2"}
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_id_1",
"content": [
{
"type": "text",
"text": "Tool1 response content"
}
]
},
{
"role": "assistant",
"tool_calls": [
{
"id": "call_id_2",
"type": "function",
"function": {
"name": "tool_name_2",
"arguments": {"arg1": "value1", "arg2": "value2"}
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_id_2",
"content": [
{
"type": "text",
"text": "Tool2 response content"
}
]
},
{
"role": "assistant",
"content": "Final response"
}
],
"parallel_tool_calls": false,
"tools": [
// list of available tools
]
}
Schematically:
- system message
- user message
- assistant that says âI want to use tool 1â
- response from the tool 1
- repeat until no more tools are needed
- final assistant response
Assuming that the above json is correct, and that one can insert tool outputs in the order shown above, our doubts are related to the following points:
- Is the loss of the model computed only for âassistantâ role messages? We do not want the model to âlearn to predictâ tool outputs (thatâs really crucial)
- Is it a good practice to teach the model how to perform tool calls AND how to provide final response, in the same training sample? Are there better alternatives to the above json?
- In the final training sample, the one to be submitted to training API, do we need to insert fake ids for the tool calls or is it fine to leave place-holders like in the example (e.g. âcall_id_1â)?
- What if one wants to perform parallel tool calls? We set 1 message of the assistant calling multiple tools and then concatenating several role=âtoolâ messages, one for each tool call?
Thank you all for reading, I hope this post will help others aswell
Resource: https://platform.openai.com/docs/api-reference/fine-tuning/chat-input