Is there an way to estimate the price for a training? The process shows the cost at the beginning (using the CLI), but at that point the process had started already. Also, it would be useful to understand the components of the pricing. Based on some sporadic information on this forum, I am not very far from an “algorithm”, but still the price is approximatelly double that I calculate.
- Should I calculate with the validation records as well, or only with the training records?
- How can I get an estimation of the number of tokens in the files? There is a language dependent estimation which is word_nums = 75% token_nums or token_nums = 133% word_nums, but in the training JSONL file there are many more characters as well, including the suggested stop sequences. Special characters a said to be 1 token per special character, so using #***# or similar for stop sequence might cost much.
- It is not mentioned in the pricing documentation, but the number of epochs might count, as those means repeated training sesions on the same training data. The default number of epochs is 4.
So my calculation would be:
Prompt_token_nums = 1.33 * avg_prompt_words
Response_token_nums = 1.33 * avg_resp_words
Stop_token_nums = number of stop and other delimiting characters
Token_nums = Prompt_token_nums + Response_token_nums + Stop_token_nums
All_token_nums = Training_records * Token_nums
Total_cost = All_token_nums * Epoch_num * K_token_price / 1000
Is this seems to be correct?