How Prompt caching works?

Thanks to everyone, I finally understand it properly! Since masking ensures future information isn’t included, KV caching allows for arbitrary-length caching and improves computation efficiency!

I hope this also work in OpenAI’s model.