Max token limit for finetuning

somayeh.molaei · March 11, 2024, 3:19pm

I was wondering about the max token limit of different models that are finetune-able. Here is what I found in the documentation: “Token limits depend on the model you select. For gpt-3.5-turbo-0125 , the maximum context length is 16,385 so each training example is also limited to 16,385 tokens. For gpt-3.5-turbo-0613 , each training example is limited to 4,096 tokens.” my examples are each around 23000 tokens. I was wondering 1) what is the max token limit on the other models? 2) Are there any tricks or tips that anyone can suggest to make the fine-tuning work without shortening the example too much?

jr.2509 · March 11, 2024, 3:28pm

Hi!

Unless someone corrects me, I believe that for the latest versions of babbage-002 and davinci-002, the max tokens would also be 16,384. For fine-tuning gpt-4, if you do have or receive access, it would be presumably 8,192 (both could and should be clearer in the fine-tuning documentation).

So under either scenario, you’d have to shorten your examples to fit those limits.

No workaround really - if they are too large the examples will get truncated, which ultimately risks negatively impacting your fine-tuning results.

somayeh.molaei · March 11, 2024, 3:31pm

Thank you for the info.
I see. I wanted to see if anyone was aware of any model with a max token limit above 16k. Seems like that’s it.

jr.2509 · March 11, 2024, 3:34pm

It’s on my wishlist, too

somayeh.molaei · March 11, 2024, 4:30pm

I know. It’s unfortunate that they don’t have these rather too small max token size when you prompt the same model. You would think they could make it possible to fine tune with larger token sizes.

somayeh.molaei · June 17, 2024, 10:23pm

Question regarding this same topic of max token limit. Let say I have this structure in each example in my traning data: “would you do x given y? To do x, you should identify and rate A, B, C, D, E, and F in Y”. Now, to get around the token limittaion, if I break each of these A, B, C, D, … in one individual exmple, would the machine learn beyond indivitual example and undrestand that it needs to find all of them in one large Y? I’m afraid it won’t get from pieces to the whole if I train it on the pieces. Any suggestions? Ideas? Insights? Thank you.

jr.2509 · June 18, 2024, 5:14am

I personally think it would not be able to pick up the pattern that it would be able to identify multiple A, B, C, D, … if the training just involves individual examples.

It’s a common behaviour that happens also in other training sets. If you include too many of the same examples without balancing that with other ones, it will likely just replicate the pattern present in the examples that dominated the data set.

Rather than excluding, do you have the possibility to shorten in any way?

Topic		Replies	Views
A question regarding fine-tuning Documentation fine-tuning	8	982	March 12, 2024
What is the token limit while fine tuning gpt3 including all prompts and completion API	6	2355	December 18, 2023
Fine-tuning examples still limited to 4k tokens even on gpt-3.5-turbo-1106 API fine-tuning	8	1193	November 29, 2023
Question regarding max_tokens Prompting	11	37852	December 13, 2023
How to overcome OpenAI fine-tuning training data token limit? API api	5	2429	December 18, 2023

Max token limit for finetuning

Related topics