Hi. I am using gpt-4o. fine tuning with function calling through tools array. Fine tuning is successful, and ft model works fine. Fine tuning is done along with tools array. So, each item in JSONL file has both messages array and tools array. I took the ideas from various posts, cookbook and chat-gpt.
What I realized after this is that I still need to send tools array in each API call, even if fine tuning file has that same tools array with each messages array. That increase my input tokens by a lot. And I choose fine tuning with function calling to counter that. So, I need some help with two questions please.
Am I missing something somewhere, or I do need to attach tools array in each API post to ft model. (the tools array is present with each messages array in fine tune JSONL file, fine tuning is successful and working)
Is there another option I can deploy where I can use my functions and still have the most optimized token usage.
OpenAI documentation about fine-tuning suggests one could even leave out the function specification. This is not the case.
If you do not pass a function definition, the tool call recipient for functions will not be turned on in the API backend.
Fine-tuning will have been further training the AI on seeing the tool specification and responding appropriately. When that input pattern is broken - no function specification that AI was trained on, no parallel tool call wrapper, then the quality will go down regardless even were there not the loss of the instructions within. That can be seen as part of the system message training that “activates” your fine-tune model.
A strict structured output for functions also requires the full schema to set up the special enforcement.
The reduction in both fine-tuning and in inference can instead be with the reduced-length system message (and in an automated task additional user instruction prompt), that would otherwise be required to get the AI behavior and understanding. A 1000 token system message could be reduced to just “An API assistant named GURU” if you have enough quality in the training example data.
If you have confidence in your training regime for function calling, you can also train on and use reduced description fields in functions.
The highest performance in a domain will be the full prompting that could make the AI perform best, along with fine-tuning on the full prompting. It just will be more expensive.
To me personally, it might also be that those tool calls may be “catched” earlier with some fine-tuned model calls injected into your workflow. But without knowing the app context, nor your workflow, it’s kind of hard to tell if that’s the case.