Does llms take instructions each time they respond in chat?

sekharmuni003 · November 14, 2024, 9:06am

I need clarification on whether the assistant takes its instructions each time it runs in the chat and responds to the user.
is there any way to reduce the instruction tokes from the total IN tokens each time the model responds in the chat?

please let me know if anyone know any small detail regarding this topic.

_j · November 14, 2024, 11:41am

I have some small details…

A language model, as those employed by OpenAI, is stateless - there is no information preserved in the inference engine about chat tokens or response tokens generated in the context window after you get your response. The model produces its result based on the total input, and the state is freed for other calls. Thus, every call you make is independent, and everything for the input that produces a response must be sent again for a new generation and sampling API call.

There is a newer service feature, prompt caching (more appropriately, context caching), that allows some of the precomputation done on producing the internal state from a sent input to be stored and reused if an identical input portion is sent again (for OpenAI, just from the very start must match). This is cached not in a system-wide persistent database (unlike Google where you can commit your own context cache to permanent storage), but instead relies on an expiring state on just the local datacenter unit that you are again redirected to in followup calls from an organization.

The caching is now disclosed, and a 50% discount offered if it is “hit”.

Reduction in “instruction” can be discarding the oldest chat occasionally in an extended conversation session, but that may impact the cache when the initial input context changes.

Topic		Replies	Views
Does the open-ai engine with gpt-4 model remember the previous prompt tokens and respond using them again in subsequent requests? API gpt-4	6	2078	January 19, 2024
How do I avoid wasting tokens by passing old messages API question	3	3790	May 11, 2023
Can Instructions be reused at no cost? Or, how to save on tokens API	4	2810	January 1, 2024
Is possible OpenAI API caching the conversation? API	4	3388	June 4, 2024
Is there a way to save tokens when dealing with longer prompts in each conversation? API	2	1382	October 30, 2023

Does llms take instructions each time they respond in chat?

Related topics