Prompt Caching Not Applied When Schema Changes

I noticed that the OpenAI prompt caching mechanism doesn’t seem to apply when there’s a slight change in a schema, even if the prompt remains the same. My hope is that if the prompt is the same or at least starts with the same content, prompt caching should be used regardless of schema changes.

Steps to reproduce the issue

  export async function testPromptCaching() {
  const completion = await openai.chat.completions.create({
    model: "gpt-4o-2024-08-06",
    messages: [
      {
        role: "user",
        content: `Sumamrize this story about pirates in less than 100 words: ${fourThousandTokenStoryAboutPirates}`,
      },
    ],
    response_format: zodResponseFormat(
      z.object({
        summary: z.string(),
      }),
      "description",
    ),
  });
  console.log("usage", completion.usage);
}

  1. Invoke the function
  usage: {
    prompt_tokens: 1246,
    completion_tokens: 139,
    total_tokens: 1385,
    prompt_tokens_details: { cached_tokens: 0 },
    completion_tokens_details: { reasoning_tokens: 0 }
  },
  1. Invoke the function again. Notice that cached_tokens are being used correctly
  usage: {
    prompt_tokens: 1246,
    completion_tokens: 131,
    total_tokens: 1377,
    prompt_tokens_details: { cached_tokens: 1024 },
    completion_tokens_details: { reasoning_tokens: 0 }
  },
  1. Swap out summary for synopsis in the function and then invoke the function. Notice that the cache is not used (this is the issue)
  usage: {
    prompt_tokens: 1251,
    completion_tokens: 130,
    total_tokens: 1381,
    prompt_tokens_details: { cached_tokens: 0 },
    completion_tokens_details: { reasoning_tokens: 0 }
  },

2 Likes

Welcome to the community @dyeoman2

This a very good question.

From the docs:

What can be cached

  • Structured outputs: The structured output schema serves as a prefix to the system message and can be cached.

This means that changing the structured output schema which itself serves as a prefix to the system message would result in a cache miss because the cached prefix is changed.

Structuring Prompts

Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.

5 Likes

Thanks @sps. It would be more intuitive if the structured output schema was the last thing considered for prompt caching, not the first.

To work around this issue, you can place the content you want to cache at the beginning of the schema as shown below.

const completion = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    {
      role: "user",
      content: `Sumamrize the story about pirates in 12 words.`,
    },
  ],
  response_format: zodResponseFormat(
    z.object({
      contentToCache: z.boolean().describe(storyAboutPirates),
      synopsis: z.string().describe(`Summarize the story in 12 words.`),
    }),
    "synopsis",
  ),
});
1 Like

I would also love to see a solution for this as well.

In my use case, I have a bunch of images that I am asking various questions about. The structured output response schemas vary with the type of question but the images stay the same and I would love to be able to cache them.
For what it’s worth, the workaround suggested above (thanks @dyeoman2) appears to only work with text but not images.