Does Prompt Caching work with the assistants api

Will the newly announced prompt caching work with my assistants api calls? currently I’m having to spend 3k tokens per run and most of that is coming from the same extensive instructions being re-sent every time to ensure the responses are usable.

4 Likes

I’d also appreciate clarity on this. I similarly use the same long prompt at the start of threads so it would be really helpful from a cost standpoint if these were cached.

1 Like

Hello, welcome both of you.

It’s not very clear to me either, but from what I can tell in the language there is support for assistant interaction.

https://platform.openai.com/docs/guides/prompt-caching/what-can-be-cached

Certainly if Assistants aren’t supported they will be eventually.

1 Like

My experience so far (2024-10-29) is that caching does not work with the Assistant API. The prompt_tokens_details field from the usage returned by the /run endpoint is not provided. By contrast, it is provided on the chat completion endpoint even when the cached tokens are 0.

I imagine it is a matter of time before it is implemented - the underlying thread objects from the assistant api seem to provide even fancier caching potential.

1 Like

On the date I posted this reply, Assistant API has returned the cached tokens but under prompt_token_details instead of prompt_tokens_details (notice missing s in token word).