Yes I completely understand where you are coming from.
In Scenario 3, if you are sending 2560 for system prompt, 2200 for tools and 1500 for user messages, that’s 6260 in total, and you can in the most optimal case expect 1024 + (40 * 128) = 6144 cached tokens. That’s assuming in the subsequent calls you don’t make any changes. Now if you start making changes to the user messages part somewhere in the middle, because KV caching is causal (i.e. only looks at preceding tokens to generate the subsequent ones), then you can be sure that the 2nd half your user messages will be evicted.