Fine-Tuning | Multiple System Messages? & Limits?

I have huge instructions on the way to respond, steps to follow, structure, style, purpose, and so on. Are there any limits on how long the system message can be? And limits on user-assistant messages?

Can I fine tune only based on this, without user-assistant messages? Are there any alternate methods to feed such instruction to fine tune?

Could I have multiple system message?

As for the limits, the following applies:

Token limits

Token limits depend on the model you select. For gpt-3.5-turbo-0125, the maximum context length is 16,385 so each training example is also limited to 16,385 tokens. For gpt-3.5-turbo-0613, each training example is limited to 4,096 tokens. Examples longer than the default will be truncated to the maximum context length which removes tokens from the end of the training example(s). To be sure that your entire training example fits in context, consider checking that the total token counts in the message contents are under the limit.

You cannot fine-tune purely based on the system message. You must also have user and assistant messages.

I am not sure what you mean by multiple system messages. You should just consolidate your instructions into a single, well-structured system message.

1 Like

Multiple system messages, as in I want to fine-tune my model again with new updates in the future.

Hypothetically speaking you can fine-tune an existing fine-tuned model. That said, if you do consider updates to the system message you have to be a bit cautious. The gist / main logic of the system message, i.e. what you are trying to get the model to do, should not fundamentally change. Smaller refinements should be fine though.

1 Like

Thanks a lot!

I just came up on this idea. I assume it might be discouraged, if it is, what solutions exist?

I can add a system message on “the way to respond” with conversation examples on it, add more pairs of system messages for topics (structure/style/etc) with conversation examples.

These limits are not correct unfortunately.

The limit for one training example depends on the model. There is no set rule as to how the tokens are to be allocated between the system, user and assistant messages. The only constraint is of course the maximum number of output tokens, which for the newer gpt-3.5 models is set at 4,096 tokens.