Repeated questions improve quality of assistant's responses

Dear OpenAI users,

:light_bulb: Noticed interesting pattern while working with Assistant API:

Ask assistant something and you’ll get a very mediocre answer.
Repeat the question, and you’ll get a much better, precise one.

Prefixing my repeated question with “Are you sure?” worked the best.

I use various assistants with models GPT-4o & GPT-4.1 and JSON data files uploaded into code interpreter. And it happens for all of them to some extent.

Have any of you spotted the same behavior from the Assistant API?

Curious.

Thank you!

1 Like

Do you mean that you are achieving this behavior when including the previous question/response pair within the subsequent call? I.e. you are not resetting/erasing the context window, but rather building the context window of the previous exchange alongside the new query?

If that’s what you mean, then yes, absolutely. This is sort of a standard caveat I’ve noticed using the LLM systems - the more context you provide, the more detailed the response is going to be.

Thus, breaking your questions themselves into more complete logical blocks, (i.e. staging the question, examining the parts of the question, etc.) and addressing them all point-by-point (or even better, getting the assistant to first “analyze the question and break it down for you”) and then whether in single-shot or multi-shot getting the various level of responses, and possibly then prompting for a synthesis/analysis of all previously produced responses, works extremely well to eventually obtain the desired result.

2 Likes

Yes. Indeed, I’m not resetting, I’m adding up more context.

Since I build systems that provide arbitrary free form chat interfaces to the end user, I think I pick this simple strategy:

  • Warm up/analyze call: “Analyze the question: [whatever user asks]”
  • Final answer call: “Given the information collected in previous step, provide the best possible answer for the question: [whatever user asks]”

I can not really stack too many calls because it may affect chat response time and will make my users unhappy.

Thank you for the insights! :folded_hands:

That makes sense, and I’m guessing that a two-shot approach will work much better than a single-shot approach, absolutely. You could also provide a sample set that represents the diversity and depth of typical questions (if possible) to the GPT, and get IT to generate a “reasoning blueprint” or “answer formatting blueprint” for itself, which would likely, if possible to generalize your use case to a significant extent, could return you to a single-shot model.

I.e. if you provide enough sample data and get the GPT to “come up with best practices for robustly answering the questions in a single shot” by "giving it more context about what kind of responses you are looking for, even up to possibly standardizing output format in a conceptual way, not necessarily in a structure way (gpt works great when giving a “form to fill in with details” essentially).

Then you upload that “guidelines documentation” that you had the GPT produce from your sample-case scenarios and use it within System Instructions or Developer Message depending on your systems use of the API

Then, test your questions again, and see what kind of results you are getting.

Usually in building end-user systems you are going to want to avoid multi-turn, because that takes much longer than say, even passing in up to several thousand input tokens of “contextual awareness” within the developer message/system instructions. Previously my advice to you was thinking this was your own personal use of the GPT, not a public-facing system with end users.

In this case, I would then recommend that you spend your time working with the GPT to analyze your normal use case scenarios, and essentially “outline the desired response style/format/level of detail, etc.”.

Then, you simply provide the information to the GPT along with the users question - and voila - in single shot you get most likely exactly what you would have gotten during multi-shot with the “warm up”.

Or probably even better, if you spend your time robustly developing the system instructions/developer message that will be sent to the GPT.

Remember, input tokens are cheap. We are talking .001 - .01 for several thousand tokens.

For example I routinely ask questions of the GPT in a relatively simple way as a “prompt”, but I provide it with a 20k-50k context window of documentation and instructions!

And then you get extremely good results.

2 Likes

I think this effect is close to what they call Chain of Thought. Its likely how o3 model has a higher effective intelligence that gpt4o model. its likely using a form of CoT against the same old 4o model behind the scenes. similarly, reflective models (where the system record is shifted but the context window is otherwise the same) will also increase the effective quality of the discourse.

1 Like