How to cache LLM responses in Langchain recent versions for OpenAI GPT4

I making an FAQ bot using latest langchain version, and pgvector as my vector datastore and GPT4 gpt-4-1106-preview

I’ve looked for caching methods and most of them very old posts, and the example in the official documentation doesn’t work. I’ve looked into GPT Cache and the project hasn’t been active for a while.

I want to save some API calls and also improve response time for the repeated and similar questions.

Can anyone point me to any projects, resources on how to cache LLM responses.

Thank you in advance

1 Like

A really fun workaround for this that I’ve used before and proposed is to use a really advanced logging built into the program that you are running that takes the initiations and the inputs and outputs of the interactions that happen and then translate them into a natural language story using the model itself of the events that occurred and a way that’s easy to read on a human level and using that you can then extract from the generated story, whatever it is that you would like to cache. For instance, you can set up parameters like whenever x occurs in this story. Place x into file y and have that file be referenced as your cache or a database or however you would like to store it.