Does llms take instructions each time they respond in chat?

I need clarification on whether the assistant takes its instructions each time it runs in the chat and responds to the user.
is there any way to reduce the instruction tokes from the total IN tokens each time the model responds in the chat?

please let me know if anyone know any small detail regarding this topic.

I have some small details…

A language model, as those employed by OpenAI, is stateless - there is no information preserved in the inference engine about chat tokens or response tokens generated in the context window after you get your response. The model produces its result based on the total input, and the state is freed for other calls. Thus, every call you make is independent, and everything for the input that produces a response must be sent again for a new generation and sampling API call.

There is a newer service feature, prompt caching (more appropriately, context caching), that allows some of the precomputation done on producing the internal state from a sent input to be stored and reused if an identical input portion is sent again (for OpenAI, just from the very start must match). This is cached not in a system-wide persistent database (unlike Google where you can commit your own context cache to permanent storage), but instead relies on an expiring state on just the local datacenter unit that you are again redirected to in followup calls from an organization.

The caching is now disclosed, and a 50% discount offered if it is “hit”.

Reduction in “instruction” can be discarding the oldest chat occasionally in an extended conversation session, but that may impact the cache when the initial input context changes.