Why is the default batch size set to 1 for fine-tuning the ChatGPT Turbo model?

Xuan_Ren · November 19, 2023, 1:58pm

I am fine-tuning the ChatGPT 3.5 Turbo model. When I fine-tune, the default batch size appears to be 1, as inferred from the loss plot. I have only sent hundreds of data points. I am curious as to why the batch size is set to 1. Is there a specific reason for this? I attempted to increase the batch size, but the performance seems to decrease with larger batch sizes. Why does this happen?

_j · November 19, 2023, 2:59pm

The batching of data is not something under your control when fine-tuning now. As hyperparameters, you only get epochs as a specification, which if not manually specified, is based on the amount of training data. For the actual meaning of “batch”, we go to what you can no longer use from source code:

batch_size: Optional[int]
"""The batch size to use for training.

The batch size is the number of training examples used to train a single forward
and backward pass.

By default, the batch size will be dynamically configured to be ~0.2% of the
number of examples in the training set, capped at 256 - in general, we've found
that larger batch sizes tend to work better for larger datasets.

I think you refer instead to “steps”, the metric now displayed. How these relate seem to only be by the number of times a validation check is run against the validation file and inputs to obtain statistics for you. That can be a high number into the thousands for tens of thousands of examples x epochs, and for lower numbers does appear to be per example per epoch.

Xuan_Ren · November 19, 2023, 10:39pm

I appreciate your view on batch size. However, in my experience, I was able to control it with this API call:

hyperparameters = {"n_epochs":5, 'learning_rate_multiplier': 0.0001, 'batch_size': 10}

In the loss plot, it seems like the number of steps become much smaller due to the larger batch size.

Could you clarify why you think batch size isn’t controllable? Thank you very much.

_j · November 19, 2023, 10:49pm

If you are fine-tuning models that won’t be turned off January 2024 (legacy), with the replacement “fine-tuning” endpoint (not the legacy endpoint with “fine-tune” in the URL - I know, confusing) your learning hyperparameters are limited to just epochs.

All other learning parameters are autotuned based on the training file, and even epochs are automatic if not specified.

We see 1500 steps in some huge jobs, which is quite different than the prior max batch number. I have not trained 10000+ example jobs to discern exactly the now undocumented batch/step behavior at that level. Under 1000 (examples x epochs), you get seemingly one step per example.

Topic		Replies	Views
Hyper-Parameter Fine-tuning Guide API api	4	2721	April 2, 2024
Questions about fine-tuning GPT-3.5-turbo API fine-tuning	1	2127	October 29, 2023
Finetuning GPT-3.5 complete early with fewer than 10% steps been run API gpt-35-turbo , fine-tuning , api	4	1569	August 31, 2023
How does openai calculate the number of epochs in the fine-tuning api? API	5	5571	December 28, 2023
Is it possible to Fine Tune a model using the Batch API? API fine-tuning , pricing , batch-api	7	352	November 5, 2024

Why is the default batch size set to 1 for fine-tuning the ChatGPT Turbo model?

Related topics