Finetuning costs not as expected. Difference of about factor 11


when I start a Fintuning job I get a message with information about the costs.
Now it is so that the costs according to this message are by approx. factor 11 over the result of my own calculation.

For example, I have a job with 100,000 words.
According to openAI, 750 words are about 1,000 tokens. Accordingly, 100,000 words would be 133,333 tokens.
If 1000 tokens with Davinci cost $0.03 per 1,000 tokens, that should add up to about $4. However, the message states a cost of over $45.

Does anyone have an idea where this difference can come from?

You need to run a tokenizer on the text to get the accurate result, not just guess at 750 words = 1000 tokens…

Hope that helps!

1 Like

Thanks for your answer and the Tokenizer link.
This explains approx. factor 2,6
Number of epochs (n_epochs) is set to 4. Does this explain approx.11 (2,6 x 4)?

I found the answer:
“Note that the number of training tokens depends on the number of tokens in your training dataset and your chosen number of training epochs The default number of epochs is 4.
(Tokens in your training file * Number of training epochs) = Total training tokens”

1 Like