I have a task that involves sending to API the same large amount of prompts but only with a few new sentences at last. I wonder if OpenAI saves previous chat in some way maybe as embedding, so I can send that same pre-prompts to it once, and resume every time with new questions. In that way, I wish to save a lot of token usage.
I’d second that this would be extremely useful: being able to save model state at the output level of a prompt preamble (e.g., a classification task), and then run that state on a series of prompt followups (for example, "Classify the sentiment of this sentence: " can run once, and then a batch of sentences could be run without need to spend tokens on the pre-amble).
But it’d probably be hard for them to implement this in the api in an elegant way, and I also read that it may not be possible given the architecture of the model (though I may have misinterpreted)
I wonder if it’s possible to use assistant API since it doesn’t need to manage context window. For example, there’s a task description of 1000 tokens, and then a stream of specific tasks. Using Completion API I have to input as “description - task” format. Could assistant API handle the “description - task - task - … - task” format?