I’ve been working on a pretty complex emotive categorization task in text, with categorizations and alternatives for specific phrases given in JSON format. I’ve made many fine-tunes with gpt-4o-mini and gpt-3.5-turbo, and there have still been accuracy issues, so before doubling my training set again, I figured I’d see what happens when I spend the extra money to try finetuning with gpt-4o.
At first, looking at the graph seemed pretty promising compared with the performance from gpt-4o-mini. The gpt-4o version has less validation loss and didn’t seem to overfit as quickly:
Compared with the performance from gpt-4o-mini, which had a higher minimum validation loss before seeming to overfit on the third epoch:
However, when I came around to actually testing the gpt-4o model, it completely failed to utilize the basic json format I wanted it to utilize – something I could get gpt-4o-mini and gpt-3.5-turbo to understand with a training set of 15 examples or less. Even with the explanation in the system prompt, and training on 3 epochs of 80 examples of proper formatting, this gpt-4o model has decided on a formatting method of its own–something which is proper JSON, but not what I want it to do at all, or what a single training example ever did even once. It’s basically completely useless for my purposes.
One thing I’m considering is that there could have just been some fluke in how this model was trained. During training, it gave a strange error, “The job experienced an error while training and failed, it has been re-enqueued for retry”, which I’d never seen before:
I don’t really want to spend another $10 and just “try again” based on this theory, though.
Is gpt-4o known to have problems with JSON formatting? I know since it’s a larger model it can be harder to train, so you might need a larger training set or more epochs. Could this problem just be a result of that? It just seems too weird because it’s such a fundamental formatting problem.