Finetuning costs not as expected. Difference of about factor 11

JohnMichaels · January 6, 2023, 12:15pm

Hi,

when I start a Fintuning job I get a message with information about the costs.
Now it is so that the costs according to this message are by approx. factor 11 over the result of my own calculation.

For example, I have a job with 100,000 words.
According to openAI, 750 words are about 1,000 tokens. Accordingly, 100,000 words would be 133,333 tokens.
If 1000 tokens with Davinci cost $0.03 per 1,000 tokens, that should add up to about $4. However, the message states a cost of over $45.

Does anyone have an idea where this difference can come from?

PaulBellow · January 6, 2023, 5:32pm

You need to run a tokenizer on the text to get the accurate result, not just guess at 750 words = 1000 tokens…

Hope that helps!

JohnMichaels · January 9, 2023, 9:19am

Thanks for your answer and the Tokenizer link.
This explains approx. factor 2,6
Number of epochs (n_epochs) is set to 4. Does this explain approx.11 (2,6 x 4)?

Edit:
I found the answer:
“Note that the number of training tokens depends on the number of tokens in your training dataset and your chosen number of training epochs The default number of epochs is 4.
(Tokens in your training file * Number of training epochs) = Total training tokens”

Topic		Replies	Views
Token Count for Fine-tuning API fine-tuning	4	2532	December 18, 2023
Why does a 1115 length fine-tuning model file costs 1,520 trained tokens? API	3	1088	March 29, 2023
Doesn't Understand fine tuned model cost API	13	7429	June 30, 2024
Discrepancy in Token Count During Fine-Tuning Job Creation API	1	60	September 23, 2024
Model fine-tuning price pre-calculation API fine-tuning	0	2448	May 14, 2023

Finetuning costs not as expected. Difference of about factor 11

Related topics