Token Count for Fine-tuning


I submitted a fine-tuning job today with davinci as the base model. My token count estimate was 1,431,706. But after finishing the job, in account daily usage breakdown, it is listed as 5,726,824 trained tokens which is 4x my estimate.

The result doc at the end of the job shows 11,806,768 as the elapsed_tokens (with repeats) in last row. Using the default 4 epochs, that would likely mean 2,951,692 tokens per epoch, which is 2x my estimate so does not match, 0.5x of what is shown in the account so also does not match.

Can anyone offer more clarity into this? It has been 5 hours since I sent the message to their help and no response still so thought I would try out the community forum too. Charge estimate is also showing up as 4x my estimate. Does that mean rate needs to be multiplied by epochs? No information or page I saw suggest that cost needs to be multiplied by epochs.

(Unrelated, someone from OpenAI should really hook up GPT-4 to this forum to make search for answers painless in case it already exists in the forum)


That might (possibly) be because the default n_epochs value for fine-tuning a model is 4.

This means when you submit fine-tuning data for processing OpenAI processes that data four times, based on the default n_epochs value, so if your training data is, for example 1000 tokens, you will be charged for 4000 tokens (with n_epochs set to the default 4).




What if we set n_epochs value to 2 ? Will it compromise modal quality ?

If you have “lots” of prompt/completion pairs, that are all similar but slightly different, then you could probably get away with 2 epochs.

However, this is a hyper parameter, so it’s not an exact science, but the AI does need to iterate over the data multiple times to converge and “learn” what you are showing it in your training file.

You want to avoid the low data and lots of epochs, because it could lead to over-fitting … aka, the AI will repeat itself … sometimes this is good though, just beware, but normally this is bad.