Is there a way to fine-tune a model for use with the Responses API (specifically, for an application that uses function calling and file search)? I was hoping to take real user queries and fine-tune it to better call the correct function tool…and also take function tool responses and fine-tune the resulting message back to the user.
The fine-tuning guides all reference the Chat Completions API, and the required jsonl format is for chat completions. I could try to convert all my Responses API history into the Chat Completions format and build a jsonl file with that, but I’m skeptical as to whether the resulting fine-tuned model would then work when I try to use it in the Responses API setting. Plus, chat completions doesn’t support file search, so I guess I’d have to avoid any of the queries where file search was an appropriate response.
If fine-tuning and responses API are just incompatible right now, any idea if it will become an option in future?
And yes, I’ve done a lot of prompt engineering already and will continue to do so!
Have you tried it yourself? I’m also feeling a bit skeptical at the moment, but I haven’t tested it with the Responses API yet. The Assistant API only supported GPT-3.5-turbo for fine-tuning with file search, so I’m curious how responses will performs.
You can produce fine tuning examples on functions, which is one type of tool that you can add specifications to. The model is employed similarly between Responses and Chat Completions.
You cannot replicate placement of tool specifications needed for internal Assistants or Responses tools, nor patterns of internal tool calling.
Thus, you can use fine-tuning to somewhat understand a function call better, to differentiate between your “product_search” and “knowledge_base”. However, you won’t be able to fill the training examples with real-looking calls or returns for an internal tool such as file search.
Fine tuning with functions is a balancing act that can only be done with experimentation – will you break function calling even worse?
Fine-tuning can damage the model quality for following instructions such as the internal tool specification, but inference is not blocked on Assistants, and file search is still an option on a newer AI model.