How does Prompt Caching work?

Hi @svelidanda and welcome to the community!

You may want to look at this thread for more details.

But in short: it’s more complicated than just thinking in terms of text/tokens and it comes down to how KV is cached (part of the attention mechanism that GPT models are composed of).

4 Likes