How to control o3-mini chat model without returning "One moment please"

kldzdj · February 12, 2025, 7:30am

Hi all,

I am now using LangGraph to build an AI agent with several tools bound.
I choose o3-mini as the model and deployed it on Azure.

Basically it works very good. The agent fist accepts the question, analyses, select proper tools and finally generate answer based on what it could find by tools.

But sometimes I can get response like “One moment please. Let me query xxx first” or “please hold on while I …”. (as shown below)
And since ReAct process is over, I am left waiting without getting final answer really.

I use the snippet below to initialize the model:

AzureChatOpenAI(
            azure_endpoint=_get_azure_openai_endpoint(),
            deployment_name=_get_azure_openai_deployment(),
            model_name=_get_azure_openai_model(),
            openai_api_version=_get_azure_openai_api_version(),
            openai_api_key=_get_azure_openai_api_key(),
            temperature=None,
            max_tokens=None,
            timeout=None,
            max_retries=2,
            reasoning_effort='medium',
        )

What I have done with system prompt is like: (but it seems no work)

You are a network expert and good at calling right APIs to fetch information about XXX. Please help user solve problems with querying information in XXX and explain your reasoning before answer.

Use related tools to solve the problem right away after reasoning. DON’T prompt user to call tools themselves. Do it for them.

DON’T let user wait, follow the reasoning steps and get the answer right away.

Base your answer on the output of tools, DON’T fabricate any details.

Provide the best answer with a user-friendly output format with markdown.

Provide a full and complete response. DON’T stop midway or leave the answer unfinished.

What I want is do not let user wait and follow the steps after reasoning right away.
Is there some parameter which I can use to control this behavior? Or how can I improve the system prompt?

By the way, I am not using stream mode.

leo17 · February 16, 2025, 3:26pm

I was getting the same thing using the Assistants API. Are you using the Assistants API within LangGraph? Currently, we can use vector stores but not file search with o3-mini assistants. Hopefully this will change soon.

merefield · February 16, 2025, 3:56pm

A search of the forum might help?:

This looks like “Assistant” behaviour (related to internal prompting within the cycle of an Assistant?) so also consider using Completions instead.

zayed.charef · March 11, 2025, 1:53pm

Hello.
I am facing the same issues here. Could anyone from OpenAI Staff give us some fixes ?

Topic		Replies	Views
Azure OpenAI o4-mini slow respond API	1	365	May 15, 2025
Taking too long to respond Prompting gpt-35-turbo , chatgpt	3	3220	January 1, 2024
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	23142	November 9, 2023
How to get just one response in a completion LLM, like OpenAI's chat vs completion API Prompting prompting , prompts-stop-words	3	20506	August 15, 2023
How can I explain to OpenAI to take his time (Chill bro) API api , assistants-api	3	234	January 23, 2025

How to control o3-mini chat model without returning "One moment please"

Related topics