June:
Now it is being disclosed, and giving unguaranteed situations and timeouts where some of the input context is discounted.
It is extremely passive, to where you could continue not knowing it was in use, and fits repeated chat turns in the applicability. That is in contrast to Gemini, where you can take manual control of storing and making use of pregenerated state of the model.
So there isn’t much to learn, except maybe to run up a system prompt + fixed tools to >1k instead of something just under, and place any alterations as late as possible - in those API calls which you will have a high chance of calling the context again soon.