Do I need to send prompt every time for a same task?


I’m new to OpenAI API. I have a question about the completion API. If I have the same tasks that may have same prompt, but different questions, do I need to send prompt every time when I called the API? Can those prompts somehow store?

For example:

Prompt: tell me if it is an animal or a fruit.
Question: Apple

Response: fruit

Prompt: tell me if it is an animal or a fruit.
Question: Cat

Response: animal

Do I need to send this “Prompt: tell me if it is an animal or a fruit.” every time when I called the API?

Thanks for your time to answer my question!

Hi @link,

it depends on what model you are using and weather you consider using finetuning or embeddings to improve your use case. So if you are planning on scaling this up I’d advice to look this two solutions up and find out more about them.

Generally speaking however and using the regulare API @rajuroyal is right - there is no way and I think his advice to store the prompts in your Application is very good.


I thought there was a way in ChatGPT’s API to specify a running context such as:

"For all subsequent questions, assume the answer should be 
either animal or fruit."

Hi @bill.french,

each time you call the API you send over the whole context of the conversation, so the API doesn’t remember anything. You could specify the context as you mentioned but this will only be temporary and if you send a new API-Call this also will be resubmitted.

But what @rajuroyal could do and I believe what you’ve meant is that you can provide one prompt and then couple this with multiple Questions. But you’d have to put that all in one prompt

I see. Yes. But combined with LangChain, is this possible?

Langchain opens up a lot of interesting possibilities in this case however if you want to optimize for token usage I’m not sure how this would help. This is because if you have the basic model without modifications you’ll allways would have to tell it what you want, LC would be a way to make the process mire streamlined but the prompt still has to be submitted each process right?

Im no LC expert so please excuse if im wrong here.

In laymen terms, Chat GPT-4 API does not “remember” your responses like when using it in the browser. To have a conversation with it, you need to constantly update and provide the previous conversation to it for context.

You achieve this by saving the previous response and adding it to the messages array every time you make a new API request / chat message. In the example above, there are 3x previous messages from the user and GPT-4 combined and the 4th one is the “new one” for GPT-4 to answer.

Here’s a code example of how “sending back the whole conversation” can look, with roles.

1 Like

Thanks for sharing and explaining in simpler words! :slight_smile:

I have begun to see this challenge in a little different light. Perhaps I’m just getting my sea-legs, but the aha moments are coming fast now.

The goal is to ensure the AI entity has at least a short-term memory, a context that helps users avoid needing to repeat themselves. This is more about UX than AI, per se. My experience is that memory begins to degrade as soon as you are forced to chop the conversation at some point because of token or cost limitations.

To overcome this, I have used this approach with pretty good results. I would love others to lend their thoughts about this.

The user asks:

How many reservations have been sold to date?

The very next question provides no context except the time frame:

How many in 2021?

The process can be achieved with separate calls to the API; one to update the context and another to perform the latest query with that context. I have also experimented with a single inference that updates the context and addresses the latest query without revealing the updated context in the response.

Lastly, I have also seen this dialog unfold without providing context from message to message. I think it’s made possible for three reasons:

  1. The “model” corpus is based on embeddings and serves as a closed environment.
  2. The AI has no room to drift; the context is pre-established and logically concludes what the user is referring to.
  3. The embedding vector matching is tight; it has little room for misunderstandings.

Which endpoint did you use? As the chat endpoint accepts an array, while the completion endpoint accepts just text, I believe this can affect how you do the code. Are you saying that instead of passing an array to the chat endpoint, you instead summarized the array and that’s what youre using as previous context?