How do I avoid wasting tokens by passing old messages

right now i have set the prompt instruction of 200 tokens
each time i am hiitng the openai api with user question ,200 tokens is also going with the user question so user question+200prompt
is there any way that my prompt instruction should go only one time not eveery time when user ask any question

1 Like

The API is “stateless”, in that nothing is being persisted on OpenAI’s side to help the LLM continue the conversation on their end. So yes, you do need to send the full conversation history every time, unfortunately.

The model does not have any memory. It is, in one sense, a dice roller.
You give it some text, and it rolls the dice and looks up in a table what the next output should be, based on the previous text and the dice roll.
Crucially, one important input to the table, is what the previous text is, and that previous text goes into the context of the model.
If the previous text isn’t there, the table won’t work right. And you must use the context to put in the previous text. No way around it!

one idea I had was to ask chat got to summarise the chat history,you program can iteratively update this saving tokens

e.g,. what is 10 plus 20

now add 10
divide by 2

summarise this in mathematical format

saves a lot of tokens
then send this as the context?