Is possible OpenAI API caching the conversation?

Hey guys!

I’m developing an API that uses the OpenAI API.

In some cases there is communication between client side / server side, and this communication is sent as a prompt to the OpenAI API, for each new prompt, it is a new request in my API.

I know that by managing the history on my side it is possible to give context to OpenAI and keep the conversation going, but this is a problem because every new prompt I need to send the context, which ends up spending too many tokens.

Is there any way to store the context on the OpenAi side?

There is a way, but it cannot mitigate the cost. “Assistants” will maintain a chat session in a thread on the API - and resend it to the model every time.

With your own code, you are able to manage your budget more effectively, selecting your own tradeoff between memory quality and expense when you choose how many past chat turns the AI should know about.

1 Like

I think you need to open a specific topic for your problem

About managing this on our side, we are already doing it here, we really don’t need to send the context again and ‘lose’ tokens with this, perhaps due to some request id or context id, you know?

The OpenAI AI model is stateless and memory-less. Every API call you make is independent. There is no referring to any previous generation or any previous state of the model that could “save tokens”.

Everything you want the AI to know and answer about must be placed into the context window the next API call you make. This means both instructions about its operations (system message) and a chat history (user/assistant pairs), along with prior tool function call results the AI may be building upon to answer (user/assistant-to-tool/tool-return/assistant).

The billing of most AI models is asymmetric - you don’t pay as much to place past responses into input as it cost to originally generate. Current attention layers and masking are highly efficient, generating the first token in under a second, to where I suspect the billings on large input contexts (encouraged by Assistants) is even more profitable for OpenAI.