Is possible OpenAI API caching the conversation?

zavadzki72 · June 4, 2024, 6:06pm

Hey guys!

I’m developing an API that uses the OpenAI API.

In some cases there is communication between client side / server side, and this communication is sent as a prompt to the OpenAI API, for each new prompt, it is a new request in my API.

I know that by managing the history on my side it is possible to give context to OpenAI and keep the conversation going, but this is a problem because every new prompt I need to send the context, which ends up spending too many tokens.

Is there any way to store the context on the OpenAi side?

_j · June 4, 2024, 6:30pm

There is a way, but it cannot mitigate the cost. “Assistants” will maintain a chat session in a thread on the API - and resend it to the model every time.

With your own code, you are able to manage your budget more effectively, selecting your own tradeoff between memory quality and expense when you choose how many past chat turns the AI should know about.

zavadzki72 · June 4, 2024, 7:56pm

I think you need to open a specific topic for your problem

zavadzki72 · June 4, 2024, 7:59pm

About managing this on our side, we are already doing it here, we really don’t need to send the context again and ‘lose’ tokens with this, perhaps due to some request id or context id, you know?

_j · June 4, 2024, 8:11pm

The OpenAI AI model is stateless and memory-less. Every API call you make is independent. There is no referring to any previous generation or any previous state of the model that could “save tokens”.

Everything you want the AI to know and answer about must be placed into the context window the next API call you make. This means both instructions about its operations (system message) and a chat history (user/assistant pairs), along with prior tool function call results the AI may be building upon to answer (user/assistant-to-tool/tool-return/assistant).

The billing of most AI models is asymmetric - you don’t pay as much to place past responses into input as it cost to originally generate. Current attention layers and masking are highly efficient, generating the first token in under a second, to where I suspect the billings on large input contexts (encouraged by Assistants) is even more profitable for OpenAI.

Topic		Replies	Views
Is it possible to reuse previous chat history on the OpenAI side to avoid sending repetitive tokens? API	5	2820	January 11, 2024
How to save input tokens in Responses API? API responses	5	261	May 23, 2025
Retain past responses in memory without sending them again at every API request API gpt-4 , gpt-35-turbo , chatgpt	11	10902	January 25, 2024
Reducing costs from the previous context and system instructions when using chat completions api API api	3	286	October 5, 2024
Does the open-ai engine with gpt-4 model remember the previous prompt tokens and respond using them again in subsequent requests? API gpt-4	6	2140	January 19, 2024

Is possible OpenAI API caching the conversation?

Related topics