Is it possible to reuse previous chat history on the OpenAI side to avoid sending repetitive tokens?


I have a task that involves sending to API the same large amount of prompts but only with a few new sentences at last. I wonder if OpenAI saves previous chat in some way maybe as embedding, so I can send that same pre-prompts to it once, and resume every time with new questions. In that way, I wish to save a lot of token usage.

1 Like

Use playground if you can, can go back and delete or keep any history in the chat

Yeah I am looking for basically the same thing but with API, because I do have many requests at the same time

1 Like

I’d second that this would be extremely useful: being able to save model state at the output level of a prompt preamble (e.g., a classification task), and then run that state on a series of prompt followups (for example, "Classify the sentiment of this sentence: " can run once, and then a batch of sentences could be run without need to spend tokens on the pre-amble).

But it’d probably be hard for them to implement this in the api in an elegant way, and I also read that it may not be possible given the architecture of the model (though I may have misinterpreted)

That’s not how the models work. What you are asking is simple not possible.

The models are stateless so every time you want to exchange messages you must send everything you want the model to have in context.

1 Like

I wonder if it’s possible to use assistant API since it doesn’t need to manage context window. For example, there’s a task description of 1000 tokens, and then a stream of specific tasks. Using Completion API I have to input as “description - task” format. Could assistant API handle the “description - task - task - … - task” format?