Is this a problem with cached tokens?

robert.oschler · October 10, 2024, 6:54am

I was doing repetitive testing of text completions with the same system and user prompt pair. Then I changed the system prompt to output a different response text. However, when I submitted the same user prompt, I got output back from the text completion call in the old response format before I changed the system prompt. I tried the exact same test 3 times and each time I got the same “stale” output format.

I began to wonder if this was related to the new token caching feature and sure enough, when I changed the user prompt, suddenly I the API returned a response in the format dictated in the revised prompt.

So is this somehow related to the token caching feature? If so, how can I “flush” OpenAI’s cache to avoid this in the future?

jr.2509 · October 10, 2024, 7:52am

I was just trying to replicate this and oddly enough I observe some similar behaviour.

I created a very detailed prompt plus example (approx. 1400 tokens). Ran this a few times with a specific user message - let’s refer to it as user message A . As expected, the caching came into effect.

I then changed one instruction regarding the output format as well as the format of the example from bullets/numbered lists into continuous prose. I first ran this with the same user message A. While as per the chat completion response object, the caching was no longer in effect, I still received the response in the old format. I then repeated the API call with new user messages, all of which were returned in the new format. After a few API calls, I tried again with user message A but it would keep returning the response in the old bullet/numbered format and this although the new prompt with the updated format was now cached.

robert.oschler · October 10, 2024, 4:11pm

Thanks for trying that. I’m wondering if this actually bug? If OpenAI has based their caching on indexing by the user prompts, instead of by the system and user prompt we would expect to see this behavior. I don’t doing that explicitly, but some path through their caching code that ends up doing by somehow ignoring the system prompt. Either way, it’s anomalous behavior and does create problems for us.

thinktank · October 10, 2024, 4:29pm

There is another thread that is having a similar issue.

I wonder if System and User prompts are cached separately?

Topic		Replies	Views
How Prompt caching works? API assistants-api , prompt-caching	17	5981	February 4, 2025
Is there a way to disable prompt caching in the APIs API prompt-caching	9	4395	April 24, 2025
Prompt caching with multiple agents API	1	553	October 9, 2024
Prompt Caching Not Applied When Schema Changes Bugs	3	370	October 9, 2024
Cachedtoken not working recently while using chatCompletion api API gpt-4 , chatgpt	1	146	January 2, 2025

Is this a problem with cached tokens?

Related topics