Fine tuning with function calling / tools help!

First post, long time reader.

I have a question regarding fine-tuning a model with function calling / tools.

I want to fine tune 4o-mini based on the good responses from 4o. Basically my prompt starts with a set of instructions and tools, then I let the model execute the instructions with the tools provided. 4o is really good at doing this, but it is costly. So I thought I could collect a lot of “good responses” from the main model and use that to train the mini model. So I did this, but the fine tuned model is worst now than just 4o-min.

To fine tune the model I used the OpenAI dashboard / playground, which automatically use all the stored calls, but this also includes the intermediate steps, like when you send the result of a tool execution and you wait for a response.

So, after all this introduction, my question is: should I be only fine tuning the model with full length conversations (so 1 full complete execution log), rather than than with intermediate steps like is done via the dashboard?

Hello! I don’t think it’s necessary to include an intermediate step here. The model should have a clear understanding of when and how to call the appropriate tool directly. My approach would involve training the mini-model using user messages that already include responses gathered from the tools.

Thank you! That is what I thought. The OpenAI Dashboard feature of storing and training is quite convenient but the downside is that you can’t pick and choose which of the conversations to store. I will try setting up a clean training set with conversations end to end.