GPT-4 internal system behavior training/concatentaion implementations

In the documentation and examples provided elsewhere, a typical query to gpt-4 consists of a dialog with single system role message, followed by assistant and user messages.

I assume the values from this dialog are concatenated to a single text prompt which is then executed by the model finetuned to understand the structure of the concatenated document, just like I’ve been doing it so far with GPT-3. Pretty simple stuff. Can you shed some light on exactly how this process happens?

Is there always a single system message followed by the dialog, or can system messages happen during the dialog to enforce changes in the model behavior? If there is only a single system message, what happens when I add second or third system messages during the dialog? Are they appended to the initial system message, do they replace the initial system message, or are they ignored?

// single system message
system: behave a certain way
user:message
assistant:message
user:message
...
// multiple system messages
system: behave a certain way
user:message
assistant:message
user:message
system: from now on, behave in some other or additional way
user:message
...

I’m also interested to know how much distinction there is (if any) between the different roles in terms of execution, their strength of influence apart from the model itself and how you implemented this. I’m researching this myself and it appears that messages coming from user affect the results almost exactly like the system messages do. In other words, it does not seem to matter whether the desired behavior (change) is put in system or user.

The davinci and especially instruct models already struggled with this. If you would prompt them with a dialog similar in structure as my example, you could put instructions to the model in the prompt as if they were coming from the model itself, and would not or barely make any difference. The instruct models are of course way too simple, too heavily finetuned on a too small dataset, I suppose, so that it doesn’t matter what the origin of the instruction is.

Based on my experience with all the GPT-3 models, especially when emulating a dialog, it matters a lot in what position of the document messages happen and the effect they have on behavioral changes. This is of course because it is still a model that predicts the next token. So simply put, a message or instruction that happened more recently (at the end of the prompt) affects the response significantly more than one at the beginning, and even more so as the prompt gets longer.

That is why I really would like to know some details of this process. Ideally I would like to see some examples of the dataset to see how it was trained and implemented internally. A lot of this information is not available in your documentation, making it a guessing game. (not very “open”) If me and my beta members have more insight into this, we can engineer prompts much more effectively and safely, especially when the prompts are responsible to output code to be executed. It also saves time trying to somehow reverse engineer this information.

Thank you.

1 Like