Prompt Caching Not Applied When Schema Changes

dyeoman2 · October 4, 2024, 7:24pm

I noticed that the OpenAI prompt caching mechanism doesn’t seem to apply when there’s a slight change in a schema, even if the prompt remains the same. My hope is that if the prompt is the same or at least starts with the same content, prompt caching should be used regardless of schema changes.

Steps to reproduce the issue

  export async function testPromptCaching() {
  const completion = await openai.chat.completions.create({
    model: "gpt-4o-2024-08-06",
    messages: [
      {
        role: "user",
        content: `Sumamrize this story about pirates in less than 100 words: ${fourThousandTokenStoryAboutPirates}`,
      },
    ],
    response_format: zodResponseFormat(
      z.object({
        summary: z.string(),
      }),
      "description",
    ),
  });
  console.log("usage", completion.usage);
}

Invoke the function

  usage: {
    prompt_tokens: 1246,
    completion_tokens: 139,
    total_tokens: 1385,
    prompt_tokens_details: { cached_tokens: 0 },
    completion_tokens_details: { reasoning_tokens: 0 }
  },

Invoke the function again. Notice that cached_tokens are being used correctly

  usage: {
    prompt_tokens: 1246,
    completion_tokens: 131,
    total_tokens: 1377,
    prompt_tokens_details: { cached_tokens: 1024 },
    completion_tokens_details: { reasoning_tokens: 0 }
  },

Swap out summary for synopsis in the function and then invoke the function. Notice that the cache is not used (this is the issue)

  usage: {
    prompt_tokens: 1251,
    completion_tokens: 130,
    total_tokens: 1381,
    prompt_tokens_details: { cached_tokens: 0 },
    completion_tokens_details: { reasoning_tokens: 0 }
  },

sps · October 5, 2024, 2:41am

Welcome to the community @dyeoman2

This a very good question.

From the docs:

What can be cached

Structured outputs: The structured output schema serves as a prefix to the system message and can be cached.

This means that changing the structured output schema which itself serves as a prefix to the system message would result in a cache miss because the cached prefix is changed.

Structuring Prompts

Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.

dyeoman2 · October 5, 2024, 8:44pm

Thanks @sps. It would be more intuitive if the structured output schema was the last thing considered for prompt caching, not the first.

To work around this issue, you can place the content you want to cache at the beginning of the schema as shown below.

const completion = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    {
      role: "user",
      content: `Sumamrize the story about pirates in 12 words.`,
    },
  ],
  response_format: zodResponseFormat(
    z.object({
      contentToCache: z.boolean().describe(storyAboutPirates),
      synopsis: z.string().describe(`Summarize the story in 12 words.`),
    }),
    "synopsis",
  ),
});

felixh · October 9, 2024, 6:05am

I would also love to see a solution for this as well.

In my use case, I have a bunch of images that I am asking various questions about. The structured output response schemas vary with the type of question but the images stay the same and I would love to be able to cache them.
For what it’s worth, the workaround suggested above (thanks @dyeoman2) appears to only work with text but not images.

Topic		Replies	Views
Prompt Caching Hierarchy with Structured Outputs Feedback	7	1212	April 26, 2026
Is this a problem with cached tokens? API gpt-4 , prompt-caching	3	1443	October 10, 2024
Is there a way to disable prompt caching in the APIs API prompt-caching	9	8136	April 24, 2025
Prompt Caching Not Working for GPT-5.4-Nano Bugs api	1	124	May 24, 2026
Prompt caching with multiple agents API	1	1278	October 9, 2024

Prompt Caching Not Applied When Schema Changes

What can be cached

Structuring Prompts

Related topics