I noticed that the OpenAI prompt caching mechanism doesn’t seem to apply when there’s a slight change in a schema, even if the prompt remains the same. My hope is that if the prompt is the same or at least starts with the same content, prompt caching should be used regardless of schema changes.
Steps to reproduce the issue
export async function testPromptCaching() {
const completion = await openai.chat.completions.create({
model: "gpt-4o-2024-08-06",
messages: [
{
role: "user",
content: `Sumamrize this story about pirates in less than 100 words: ${fourThousandTokenStoryAboutPirates}`,
},
],
response_format: zodResponseFormat(
z.object({
summary: z.string(),
}),
"description",
),
});
console.log("usage", completion.usage);
}
- Invoke the function
usage: {
prompt_tokens: 1246,
completion_tokens: 139,
total_tokens: 1385,
prompt_tokens_details: { cached_tokens: 0 },
completion_tokens_details: { reasoning_tokens: 0 }
},
- Invoke the function again. Notice that cached_tokens are being used correctly
usage: {
prompt_tokens: 1246,
completion_tokens: 131,
total_tokens: 1377,
prompt_tokens_details: { cached_tokens: 1024 },
completion_tokens_details: { reasoning_tokens: 0 }
},
- Swap out summary for synopsis in the function and then invoke the function. Notice that the cache is not used (this is the issue)
usage: {
prompt_tokens: 1251,
completion_tokens: 130,
total_tokens: 1381,
prompt_tokens_details: { cached_tokens: 0 },
completion_tokens_details: { reasoning_tokens: 0 }
},