Fine-Tuning finish before the full training

Over the last month I have sent in several models for API fine-tuning. But in the last 2 days the fine-tuning was finished before the full epochs.

e.g.

  1. I send gpt-4o to train with 3 epochs
  2. OpenAI finishes fine-tuning on the first epoch.

The problem is that this is not just a UI problem, I tested another model from the same day that ran for 3 epochs and the results are 93% v/s 83%.

Has this happened to anyone? Or does anyone know how to contact support for these cases?

3 Likes

For research, you would need to document the training job ID and your high certainty that the correct options were chosen in the UI or specified as hyperparameters in a recorded API request.

Then for contacting OpenAI to report a potential service problem, you would use “help” through the platform site’s menu, send a message, and convince the first tier that it needs staff investigation instead of a how-to bot response.

If you really wanted to get into the nuts-and-bolts, you could use browser developer tools, and capture the API request being sent by the platform site client to initiate the fine-tuning job.

Without specifically setting hyperparameters, OpenAI gets to decide on epochs based on the training size, but epochs is more like 3-9.

To not waste the expense of the first round of weighting you currently have, you can use the ft: model that was created with one pass-through of epochs to continue training on, by specifying it as the model when sending another fine-tune job.

Sending by UI

Initial UI report

Network request log


Try 2: file uploaded through UI, only slider touched to increment from DEFAULT=1 (one potential source) to 3.

Another case of the correct API request:

No probs, completed.

We have used the parameter of epochs a 3 over a total of 1143 data points on training, but just train over one epoch.

Trainning UI:

s

See the attatched pictures Hyperparms:
Screenshot 2024-11-26 at 3.10.50 PMm

I would have to conclude that the multiplier - a multiplier on top of the DEFAULTS that aren’t exposed, is ridiculous high.

You can see the immediate bottoming of the loss, and then the spikies at the introduction of new contexts that never reduce in magnitude.

Guess it depends on if it performs, and how the result compares to the intermediate steps models that should have been generated.