How to cache LLM responses in Langchain recent versions for OpenAI GPT4

theinhumaneme · March 2, 2024, 8:48pm

I making an FAQ bot using latest langchain version, and pgvector as my vector datastore and GPT4 gpt-4-1106-preview

I’ve looked for caching methods and most of them very old posts, and the example in the official documentation doesn’t work. I’ve looked into GPT Cache and the project hasn’t been active for a while.

I want to save some API calls and also improve response time for the repeated and similar questions.

Can anyone point me to any projects, resources on how to cache LLM responses.

Thank you in advance

reconsumeralization · March 2, 2024, 8:51pm

A really fun workaround for this that I’ve used before and proposed is to use a really advanced logging built into the program that you are running that takes the initiations and the inputs and outputs of the interactions that happen and then translate them into a natural language story using the model itself of the events that occurred and a way that’s easy to read on a human level and using that you can then extract from the generated story, whatever it is that you would like to cache. For instance, you can set up parameters like whenever x occurs in this story. Place x into file y and have that file be referenced as your cache or a database or however you would like to store it.

Topic		Replies	Views
Caching system prompt to facilitate interaction between user and llm API gpt-4	3	2132	September 19, 2024
Do you cache your API results? API	5	3607	July 13, 2023
Caching representations API	5	7252	July 13, 2023
Short lived memory for chatbot API	1	881	August 22, 2023
Issues with SQL Chatbot: Slow Responses, LLM Data Pull, and Caching Needs API gpt-4 , chatbot	0	64	June 19, 2025

How to cache LLM responses in Langchain recent versions for OpenAI GPT4

Related topics