I need to unpack that as I do not completely understand
By “attention capacity” you mean ho long it recalls the message with role being “system”?
I have not seen the term “attention tokens” before. I know about tokens. Tokens control the cost and rate of API service. But “attention tokens” are new to me.
When you say “remind it” do you mean resend a message with the “system” role?
The system message helps set the behavior of the assistant. If properly crafted, the system message can be used to set the tone and the kind of response by the model.
gpt-3.5-turbo-0301 does not always pay strong attention to system messages. Future models will be trained to pay stronger attention to system messages.
However this can be overcome by making the system instruction more explicit, detailed and comprehensive.
@raul_pablo_garrido_c is trying to refer to context length of the model but is wrong about how it works. If the system message is included in the conversation, and the request is valid, the model will respond to the chat completion request.
If the system message is in the beginning of the messages array in the API argument, would this “context” have less weight than if it were at the end of the messages array?