What does it mean to limit tokens for fine tuning?

As I understood from the documentation. You can add information about the product to the assistant (that is, train it on your database). Then you can test it. If I don’t like something, I can use Fine-tuning to change the assistant a little so that he responds better in some specific situations.

I want to know about each of these models. The documentation says the following:
Token limits depend on the model you select. For gpt-3.5-turbo-1106, the maximum content length is 16,385 so beach training example is also limited to 16,385 tokens.
I understand it this way. At one point I can become an assistant during Fine-tuning, I can only transfer 16,385 tokens to train him.

Also, as I understood, during the creation of the assistant, I can upload as much information as I want into it?

Tell me, I didn’t understand the documentation a bit.

Hello and welcome to the community!

I think you understood everything correct.
Fine-tuning is used to change the way how they model responds and not to add knowledge.
When you fine-tune the model you are giving examples. For example 1) you are a expert in something 2) the user asks something about the area of expertise 3 the ideal answer would look like this…

And these three elements together have to fit into the context window of the model variant you are fine-tuning.

Yes, you can train the model on as many examples as you want. How many you will actually need and how often you repeat the same examples in your training (epochs) is up to you, but actually depends on the results you are getting. Advise is actually ranging from as low as 10 to several hundred thousands. So you will either need to look up examples that match your use case or explore the results going forward.

That’s when you start learning to fine-tune.
I suggest you create a very small set of examples and do an initial test before creating the whole training set.

Hope this helps!

1 Like

One has to make very clear a conflation that newcomers have made.

Fine-tune is a machine-learning technique on API for producing your own AI models, by refining the weights that are used in token production, and using a very large training file that has examples of how an AI should respond. A fine-tune model is only available within an API organization that created it.

GPTs and Assistants, instead are agents. They can operate more autonomously, making multiple calls to AI models, and have built-in features that supplement the AI model capabilities. ChatGPT GPTs only use the GPT-4 AI that is within ChatGPT Plus, and assistants can only use the newest AI models also introduced in November if one wants success. Fine-tune AI models are blocked from being used.

Training file example length refers to individual fine-tune conversations in full. It is the total length of one of the many chat examples that can be in a fine-tune training file to make the AI learn how to respond.

Token limits depend on the model you select. For gpt-3.5-turbo-1106, the maximum context length is 16,385 so each training example is also limited to 16,385 tokens. For gpt-3.5-turbo-0613, each training example is limited to 4,096 tokens.


OP has requested this stay open even with a selected solution.

And if I then transfer another file (.json). That is, I completed the first request, and now I’m completing the second one. Then will it be considered another request?

Fine-tune with a jsonl training file is a very deliberate process that takes costly resources - billed to you. It also takes intensive data preparation. It is not merely uploading files.

One file contains the entire training session, and then another validation file can track the learning process.


Without understanding the process and the capabilities, it is very easy to waste your money creating non-functional AI.

What does 16,385 currents mean for gpt-3.5-turbo-1106? This is as long as I can in a single file (.json) transfer the data. That is, I have a file (.json) with a context for 18,000 tokens, if I send it to Fine-tuning, I will get an error, because the gpt-3.5-turbo-1106 model has a maximum of 16,385,

Each model has a particular context window length.

This a space of memory that is both loaded with the input that you provide, and then the remaining space available can be used for forming an AI-created answer, one word at a time, that is inspired from continuing upon that input.

In generative AI language models, the basic way they operate is completion. This is similar to auto-completion on your phone when you are typing, but a much more advanced version.

Here, I place some text into the AI model context, and then it generates more tokens (in green).


(ignore the quality of what the AI wrote)

My input to the AI was 10 tokens, and then the AI continued writing for 50 more tokens (the limit I set). The total occupied 60 tokens of the model’s context length.

For this model with 4096 token context length, I could have furnished much more input or gotten much more response (paying for each token).

Fine-tune training is a quite different machine learning process, but is also informed by the model’s context length.

In total, the amount of language I place which will train the probabilities cannot exceed the size of the model (for OpenAI’s particular case). For chat completions fine-tune, this is the total simulated conversation that you wish to place at once as an example conversation completion.

And that’s the point: you are not training the AI on documents. You are training it on how to respond differently to user input, training it how to continue upon the context portion that has been pre-loaded. Teaching it which token sequences lead to others.

1 Like