Fine-Tuning & Function Calling

Hi all,

I am trying to combine fine tuning and function calls. I have fine-tuned a model (gpt-4o-mini) and am trying to use it with function calls. However, the fine-tuned model no longer calls my functions (tool_choice=‘auto’). If I use a non-fine-tuned model (gpt-4o-mini), my functions are called perfectly.

I can use the parameter tool_choice=‘required’ to force my fine-tuned model to use my functions. However, I want the model to decide for itself when it wants to use the functions and when not.

Forcing the model to call a function does not always work. Often the API returns an error:

openai.InternalServerError: Error code: 500 - {‘error’: {‘message’: ‘The model has produced invalid content. Consider changing your prompt if you keep seeing this error.', “type”: “model_error”, “param”: None, “code”: None}}

Has anyone else had this experience? How did you solve it?

This is what my custom GPT says about your issue…

Thanks for clarifying that you’re working with a fine-tuned version of gpt-4o-mini. Based on the error you shared and the fact that the function call works fine with the non-fine-tuned model, it seems the fine-tuning process might have affected how the model handles function calls or generates outputs expected by the API.

Possible Causes:

  1. Function Call Format Issue: The fine-tuning may have inadvertently altered how the model structures function calls, causing the model to generate invalid or malformed function responses. This is consistent with the error message, where the output from the model is invalid.
  2. Overfitting During Fine-Tuning: The fine-tuned model might be overfitting on specific patterns or content, which could affect its ability to generate valid outputs for function calls. This happens if the fine-tuning data emphasizes certain outputs too strongly, affecting general behaviors like function calling.
  3. Different Output Schema in Fine-Tuned Model: If you customized how outputs are formatted during fine-tuning (e.g., different tokenization or content formatting), this could lead to mismatches between the expected function call format and what the fine-tuned model generates.

Troubleshooting Steps:

  1. Review the Fine-Tuning Data: Ensure that during the fine-tuning process, the function call structure was preserved and handled correctly in the training data. If function calls were part of the training, check the examples and outputs to ensure they match the expected format.
  2. Test with Simpler Prompts: Try running simpler function call prompts with the fine-tuned model to see if it’s a general issue with how the model handles function calls, or if it’s related to the specific case you’re working on.
  3. Compare Output Logs: Compare the outputs of the fine-tuned and non-fine-tuned models for the same function call. Look for differences in how the models structure the function call response.
  4. Validate the Function Call Format: Ensure that the fine-tuned model is producing function call outputs in the correct structure. Sometimes, adjusting the prompt to explicitly structure the function call output might mitigate the issue.

If none of these work, consider revisiting the fine-tuning process and retraining the model with a stronger emphasis on function call behavior. You could also escalate the issue to OpenAI if it persists across different prompts, as it could be a deeper model issue.

I tried it out with the ‘get_weather’ example from the Playground. Here the function is also called with my fine tuned model most of the time. It doesn’t always work reliably, but in most cases it does.

This means that the model has not forgotten how to call functions. I will try to rework my system prompt.


(That never happens to me with a model that is not fine tuned)