KV Cache with the OAI API - when does it kick in?

timvalicenti · April 16, 2024, 6:56pm

does anyone know how KV-caching works with the OAI API calls? I assume when using ChatGPT they are saving a ton on compute by caching KV matrices but when using the API instead it’s less clear what they do in the backend.

my hunch is that if you use threads with the API then they handle this automatically to reduce their costs / your latency but if you don’t use threads they treat each API call as an entirely new query. But mainly hoping to confirm my hypothesis here.

To provide more context on why I care: if I alternate sending prompts between GPT-4 and Claude Haiku then as the context grows does OpenAI automatically stop trying store a KV cache? Would my costs and latency go up?

_j · April 16, 2024, 7:51pm

Each API call is its own entity, even when initiated by assistants, which still is employing the same API model, except with a framework of their code instead of your code that loads the context and catches and returns its own tool calls without external interaction.

There is certainly opportunity for precomputation of states. Why should OpenAI recompute “You are ChatGPT” a million times an hour?

API calls don’t give much surface where the overhead of storage and retrieval could be worth it. They start anew at instructions each time. And attention whether my first input is “from apple documentation…” or “from banana documentation” is immediately different.

One can speculate, even probe with timing attacks, but the API models are a black box of secrets where simply language comes out to fulfill your input.

timvalicenti · April 16, 2024, 8:36pm

Yeah, I figure only an employee of OpenAI can answer this one. They mention that some prompts are more KV-cache efficient in their docs so they definitely do something but ultimately need an inside developer to confirm.

Topic		Replies	Views
In_memory vs 24h caching; help please API api	3	352	February 17, 2026
What does prompt caching store API prompt-caching	1	854	October 11, 2024
Prompt Caching for o3-mini? API o3-mini	3	666	February 4, 2025
Is possible OpenAI API caching the conversation? API	4	4541	June 4, 2024
How Prompt caching works? API assistants-api , prompt-caching	17	10540	February 4, 2025

KV Cache with the OAI API - when does it kick in?

Related topics