OpenAI Assistant API: One Run with many message or One Run per message

Hello everyone,

I have a question if anyone has an opinion here.
In my scenario I supply a document with information.
Then I have a list of questions which I want the assistant to answer.

I can see two ways to do this:

  1. Create a list of messages (one message per question) add them to the thread and afterwards create a run and retrieve the messages (answers) when the run completes

  2. Create one message and then create a run and retrieve the answer to this question when the run completes. Repeat this for all questions.

Any opinion on this is 1) better than 2)?
So far I do 1)…

Thank you!
Happy Coding!

  • Felix
1 Like

Messages are not jobs to be performed. They are parts of a chat.

A bunch of user messages in a row will end up with the AI mostly focusing on the last input as the question to be answered, as prior turns of user/assistant messages are seen as past conversation that is not to be answered, but instead is only relevant to maintaining the topic memory of a chat.

1 Like

Thank you for your reply.
If I understand you correctly, you are proposing solution 2) ?

“Messages” are chat. If you’re not interested in having the AI remember a conversation, you simply supply a single user input and get your AI response, and then abandon the thread.

Typically, you will want to get as much “answering” done as possible in one API call. You have to pay for the full context input of documents being used for every call. So: if I use chat completions and reference a 50k token document that I include completely, that is $0.50 right there without anything yet being written by the AI. But all the knowledge is there to answer any question.

When the list of questions becomes longer, the AI may lose focus of what it is answering, so it is good to use an output format like JSON where you have for each item being produced something like “question summary” and “AI answer”, so the AI is immediately reproducing the question to be answered before the answer, and can focus on the particular item.

Many people try to make the AI produce a series of true or false answers with no grounding, which quickly looks like meaningless text.

Now, a significant change, though: Using assistants and larger retrieval documents than can fit in model context is a bit different, as you:

  • have no control over what data is being searched on or explored by an AI that has a search function;
  • have an AI that has to find answers for everything before starting to answer, flooding the context with off-topic results for the other questions if it doesn’t simply give up and give bad answers.

So the optimum strategy when using vector database semantic RAG is to pose one question per call, and use the efficiency of the RAG search and threshold cutoff on obtaining relevant information for one question.

Assistants is not efficient, loading up the context with documentation whether it has relevant answers or not, increasing the bill, so while single questions will still be the best, operating costs will be high regardless.

If this is a serious task, I would look at incorporating embeddings-based retrieval that can fulfill an automated search for information from a user question much better and more efficiently than assistants.

1 Like

Depending on how complex the questions are, it sounds like #2 might be better. If you do #1 it might not be able to answer them all, or it might just forget to answer them all.

If you want to do #1 and are having trouble getting it to answer all the questions, the prompt below is one way to help it remember. I gave it an easy list, but this should work with a more complex list of tasks.

I have four questions for you to answer:

  1. Who is the main character in the movie Office Space?
  2. Please write a short paragraph describing the main character.
  3. Who is the main character’s romantic interest?
  4. Write a short paragraph about the romantic interest.

Please use the following format to answer the questions:

I have [N] questions to answer. I will start with the first one.

I will now answer questions [1]. After this I will answer question [2].

Here is question [1]:

Question: [TEXT OF QUESTION 1]


I will know answer question [2]. After this I will answer questions [3].


I will know answer question [N]. After this I am finished.

Here is question [N]: