I need to unpack that as I do not completely understand
By “attention capacity” you mean ho long it recalls the message with role being “system”?
I have not seen the term “attention tokens” before. I know about tokens. Tokens control the cost and rate of API service. But “attention tokens” are new to me.
When you say “remind it” do you mean resend a message with the “system” role?
The system message helps set the behavior of the assistant. If properly crafted, the system message can be used to set the tone and the kind of response by the model.
gpt-3.5-turbo-0301 does not always pay strong attention to system messages. Future models will be trained to pay stronger attention to system messages.
However this can be overcome by making the system instruction more explicit, detailed and comprehensive.
@raul_pablo_garrido_c is trying to refer to context length of the model but is wrong about how it works. If the system message is included in the conversation, and the request is valid, the model will respond to the chat completion request.
If the system message is in the beginning of the messages array in the API argument, would this “context” have less weight than if it were at the end of the messages array?
I stumbled on this thread randomly. But I find that whenever my request exceeds the token limit, the system prompt is lost.
I found out about this because sometimes a Whisper API transcription spirals out into an endless list of character repetitions.
The transcription is sent to GPT4 for translation and my system prompt includes an instruction to omit character repetitions, though every time the aforementioned problem occurs, the repeated characters are not removed in the API response.
From this observation I started wondering why this happens, and I think it’s logical to conclude that the system prompt does not take precedence.
I don’t know if putting the system message at the end of the array would affect this though. I never thought of that, but it’s worth a try.
That would make sense, since the longer the overall content, the more “noise” there is. Try to cut the conversation short by implementing a sliding window (last 10 or so messages).
I came looking for descriptions that the context window is competing with the system prompt and you guys are thinking the same. I am wondering what you figured out since.