I have a relatively large dataset of example prompts,and their corresponding assistant outputs which I was going to use to finetune a model which uses the chat completions API.
The main reason for this, is that I was using function calling, and the model would often produce outputs that didn’t match the JSON schema.
I have since decided to switch to the Assistant API due to the added context features and window.
Personally, my workflow is to test the model “as-is” then if there are issues, I use prompt design to attempt to solve them, if I need domain specific information from large datasets, I’ll use RAG. If at that point I still have output formatting problems, I will first try programmatic correction with traditional code and as a finial step I may choose to fine-tune a model.
This works for my use-cases but may not be suitable for all, however, I would still say that fine-tuning is primarily a finial step when polishing a models output if required, of course if you are doing something like moderation, or specific classification, then you may want to go for fine-tuning right away.
By all means, give your fine-tune a try, AI and it’s related subsystems are too new a technology for anyone to be able to say categorically that Dataset A will produce outputs of a given type B with much certainly, we can give general findings and rules of thumb, but anything over and above that will, a) probably be wrong and b) be out of date in 6 months.