Fine Tune GPT-3.5 Results File: Meaning of "Step", Number of Rows, and Randomization

gwang1 · August 29, 2023, 7:22pm

I fine tuned several GPT-3.5 models varying the size of the training set: one with size 12, another 400, and another with 4000. They were run with 8, 3, and 3 epochs, respectively. I exported the contents of the result_files and have some questions. The first few lines looked like this:

step,train_loss,train_accuracy,valid_loss,valid_mean_token_accuracy
1,1.52347,0.0,,
2,2.38448,0.0,,
3,0.56809,0.0,,

The resulting files have 96, 1200, and 1500 lines (excluding header), respectively.

What do each line represent? Do they each represent running one entry of the training set through stochastic gradient descent, where an epoch is running through all the entries of the training set? If so, then why is the file for the third fine tuning model mentioned only consist of 1500 lines instead of 12000? And when training, does GPT automatically randomize the order for each epoch?

In addition, why is there no value assigned to the column “valid_mean_token_accuracy”?

_j · August 29, 2023, 9:15pm

Steps likely refers to batch size progress.

There are two missing numbers after each comma. The report is not as useful if you didn’t provide validation data with your training data - a set of similar unused inputs that used to evaluate how well the model does.

Reporting could tell you when your training is reaching an optimum point - better if you were then able to continue fine-tune to make use of this information instead of starting again with the new endpoint.

ldws · August 30, 2023, 10:56am

Thanks, this is really helpful! I wasn’t aware that we could attach validation data as well (we’ve been doing this via a separate process). Is it easy to point us to reference on how to do this? Sorry if this is a dense question, but the docs are really terse.

_j · August 30, 2023, 11:13am

The current docs are just terse, but they tore down the prior completion model documentation that was better, removed github cookbook examples, and block archive.org from showing a history of captures that may or may not be correct or useful.

One would have to look to other guides beyond what is current to see how completion models still in operation for a few more months were trained: https://www.datacamp.com/tutorial/fine-tuning-gpt-3-using-the-open-ai-api-and-python

One could try to see what function remains with experimentation, but since they wrote that weights and measures no longer works lets you guess that they didn’t include ability for validations performance metrics during tuning (“coming maybe”).

Topic		Replies	Views
Questions about fine-tuning GPT-3.5-turbo API fine-tuning	1	2182	October 29, 2023
Steps Meaning in Fine Tuning and How to Pick the Optimal Number of Epochs from Them? API fine-tuning , fine-tuning-problems	13	8642	December 28, 2023
Evaluation of Fine-Tune Model API	4	2375	February 17, 2023
Loss Function in Fine Tuning API gpt-35-turbo , fine-tuning , fine-tuning-problems	3	7783	September 29, 2023
Fine tune multiple models? [resolved] API	6	1627	August 13, 2021

Fine Tune GPT-3.5 Results File: Meaning of "Step", Number of Rows, and Randomization

Related topics