Caching system prompt to facilitate interaction between user and llm

alexander.konstantin · August 19, 2024, 3:57pm

In a dialogue with gpt-4o using the OpenAI API it would be cost and latency advantageous to cache a system prompt that would incorporate a large document that would be advised by the llm each time it would respond to a question . The caching would take place on OpenAI side and would turn the interaction to statefull. Any ideas?

supershaneski · August 19, 2024, 11:36pm

you can do this with assistants api. once you setup the system prompt(instructions), knowledge files (vector store) , added tools, created thread, afterwards you basically just send one message each time you interact with it. but of course, you still pay for all the tokens used. but you are only sending one message.

stevenic · August 20, 2024, 2:30am

All of these services use similar libraries to implement the transformer used in their models. We’re starting to see other services like Gemini and Claude add caching support so it’s likely just a matter of time before OpenAI adds a similar feature.

UPDATE:
Right after responding to this I stumbled upon interview with Jeremy Howard (Answer.ai and Fast.ai) and he was discussing the same topic. Jeremy suspects we’re going to start seeing all of the model providers add some form of KV Caching. It’s just too big of a perf boost to not offer KV Caching in some form.

aviad1 · September 19, 2024, 4:04pm

this is interesting, I have two followup questions

my usecase is to always just send system+user prompt (so no thread). would it still give me some caching of the system prompt?
Is assistant API covered in LangSmith in any way?

Topic		Replies	Views
What does prompt caching store API prompt-caching	1	502	October 11, 2024
Short lived memory for chatbot API	1	885	August 22, 2023
Caching representations API	5	7259	July 13, 2023
Options for caching same prompt x thousand of requests..? API api	4	2804	June 1, 2024
GPT4 stores the prompt in cache? Prompting gpt-4	2	1134	April 24, 2024

Caching system prompt to facilitate interaction between user and llm

Related topics