Prompt caching with multiple agents

XXIV · October 4, 2024, 2:43pm

In order to fulfill a task, my system breaks down inference in different bits. This allows me to tackle the task in parallel, with specialized prompts for specialized agents that collaborate to fulfill the task. So my inference isn’t your typical “conversation” where user/assistant bits get added over time.

Some of the prompts used are long – 500 to 13000 tokens; average 4000 token; a typical “long” prompt has 8000 input tokens. Task-specific variables are injected in the prompt, typically starting at around 50-60% of the prompt. This means that I’d technically be able to benefit from caching for ~50% of my input tokens.

After experimenting, I noticed that CompletionUsage.prompt_tokens_details.cached_tokens is always 0. My assumption is that OpenAI only keeps one cached prompt, meaning if you have two different prompts used one after the other, but repeatedly, you will not benefit from caching. Is this understanding correct?

If my understanding is correct, what might be workarounds for this?

One thing I was thinking of trying was to split my project into a number of different API keys (one key per prompt), to see if caching was at the key level (we currently use a single key, and keep track of inference accounting with a separate analytics platform). But I suspect that keys are only used by OpenAI to allow for more granular reporting and limiting.

XXIV · October 9, 2024, 2:39pm

Wanted to update this – turns out my prompts weren’t getting cached because I was using chatgpt-4o-latest, which isn’t listed in the supported models for prompt caching: Prompt Caching - OpenAI API:

Prompt Caching is enabled for the following models:

gpt-4o (excludes gpt-4o-2024-05-13 and chatgpt-4o-latest)
gpt-4o-mini
o1-preview
o1-mini

When I switch to something like gpt-4o-2024-08-06 for my long prompts, prompt caching works, and more importantly, it works even if you’re using a number of different prompts.

Topic		Replies	Views
Understanding Prompt caching API prompt-caching	0	279	January 2, 2025
How Prompt caching works? API assistants-api , prompt-caching	17	6722	February 4, 2025
Does Prompt Caching work with the assistants api API	4	761	January 8, 2025
How does Prompt Caching work? Prompting api , prompt-caching	8	4126	October 29, 2024
How to improve caching accuracy API api , prompt , prompt-caching	1	43	July 8, 2025

Prompt caching with multiple agents

Related topics