Does Prompt Caching work with the assistants api

jjeccles · October 2, 2024, 1:28pm

Will the newly announced prompt caching work with my assistants api calls? currently I’m having to spend 3k tokens per run and most of that is coming from the same extensive instructions being re-sent every time to ensure the responses are usable.

AstonAtom · October 3, 2024, 8:05pm

I’d also appreciate clarity on this. I similarly use the same long prompt at the start of threads so it would be really helpful from a cost standpoint if these were cached.

thinktank · October 3, 2024, 10:37pm

Hello, welcome both of you.

It’s not very clear to me either, but from what I can tell in the language there is support for assistant interaction.

https://platform.openai.com/docs/guides/prompt-caching/what-can-be-cached

Certainly if Assistants aren’t supported they will be eventually.

gpetit · October 29, 2024, 4:50pm

My experience so far (2024-10-29) is that caching does not work with the Assistant API. The prompt_tokens_details field from the usage returned by the /run endpoint is not provided. By contrast, it is provided on the chat completion endpoint even when the cached tokens are 0.

I imagine it is a matter of time before it is implemented - the underlying thread objects from the assistant api seem to provide even fancier caching potential.

aswincandra · January 8, 2025, 4:18am

On the date I posted this reply, Assistant API has returned the cached tokens but under prompt_token_details instead of prompt_tokens_details (notice missing s in token word).

Topic		Replies	Views
Does prompt caching supports in the API and Assistants API playground , assistants-api	2	529	October 2, 2024
Prompt caching with multiple agents API	1	1232	October 9, 2024
Prompt caching doesn't seem to work regularly API api , prompt-caching	4	969	July 13, 2025
Caching system prompt to facilitate interaction between user and llm API gpt-4	3	2384	September 19, 2024
Cached input audio_tokens is always 0 API audio , realtime	3	566	November 8, 2024

Does Prompt Caching work with the assistants api

Related topics