Hi there, I have been using the new structured outputs feature and it has been working great for a few of my use cases. I am now hoping to use gpt4o-08-06 responses to finetune gpt-4o-mini-07-18 to my use case.
I am planning to do this in a maintainable way where I have a script that I can basically input a .txt of prompts into along with a pydantic response model, then automatically get the gpt-4o-08-06 responses via batch api which will automatically be saved to finetune a gpt-4o-mini model. I feel like this will be really useful for any case where gpt-4o-mini can’t quite cut it on its own without finetuning, but gpt-4o-08-06 is overkill. Basically it allows finetuning without having to take the time to manually assemble prompt response pairs. I’m planning to share the repo for this once I have it in a good place as well.
Has anyone tried finetuning with the new (aug 6th) structured outputs yet? If so, maybe you can answer my question:
Do you need to include response_format schema in user messages in your finetuning jsonl? Or should it just be the system/user/assistant unstructured messages even if response_format schema is being used in the background? I think the key issue here is that the gpt-4o output is affected by the field descriptions in the response_format definition, so how can the finetune capture that aspect of the prompting unless you include it as part of the user message in the finetune file?