GPT-5 Reasoning Effort Impact on Agent Performance

matt_ambrogi · September 19, 2025, 8:47pm

I’m building an agent on top of the Agents SDK with gpt-5 as my underlying model.

I’m wondering if anyone else is doing the same as has found any evidence of performance impacts from switching reasoning_effort from default “medium” to “low” or even “minimal”? If not personal experience, have you seen any good research on this?

Our agent is a knowledge work assistant that is primarily used to gather and synthesis data from a bunch of sources. The agent has a handful of high level tools it uses to retrieve context.

I’ve been experimenting with the effects of changing the reasoning effort param from “medium” to “low”

The primary motivation for this is that when I look at my agent traces, very little of the latency is due to tool calls. Its almost all the model sitting thinking before or after tool calls.

The experiments have been interesting. Some findings so far:

On high complexity tasks requiring lots of context gathering and synthesis, latency drops significantly when turning reasoning down. Quality does not seem to degrade much if at all.
For less complex questions, latency does drop with lower reasoning, but not by a very impactful amount.
Sure enough, low reasoning burns way less reasoning tokens even if the model takes the exact same tool calling path.

Based on these results I started to think, great, I can get away with low reasoning effort. However:

Sometimes with low reasoning, the agent misses a critical step / tool call that ultimately cause it to produce an incorrect final response. An example: it does one batch of tool calls and gets partial information. On medium reasoning the model would do another batch and attempt to clarify ambiguities. On low it does not.

My experiments have been small in scale so far. If anyone else has insights here I’d love to hear them.

mcfinley · September 20, 2025, 10:51am

If your agent is performing simple tasks (extracting simple text fields or sentiment, chatting about a support ticket, etc.) then reasoning effort low is probably fine, and will be much much faster. Use reasoning levels set higher when you want the model to think deeply about a question. For example, if you want it to look at an entire tech support chat was effective at applying training materials correctly. To some extent you can set reasoning low and gather information quickly, then raise the reasoning level to make sure things are on track. You can even do this in parallel so the user is experiencing an interactive discussion but there is a smart supervisor in the background. If your use case is not a chat-style agent, you can still use a low reasoning effort to quickly determine whether or not a long think is needed. If so, call the agent again with reasoning cranked up.

matt_ambrogi · September 22, 2025, 2:34pm

I agree with all of this just based on my experiments and what I’ve read elsewhere.

But I’m curious if there’s data to demonstrate this. I.e. how much does switching reasoning effort a step impact number of tool calls, end to end accuracy, etc.

mcfinley · September 22, 2025, 3:37pm

Great point… data vs anecdotes! I do not have recorded data on the granularity of reasoning levels… its multi-variate since you would also have to track max-tokens, temperature, etc. to plot this out. OpenAI may have it… Useful experiments would not be hard but could take some time and cost, especially if there’s any kind of temperature setting>0. I spend $ on reliability experiments (ie can I get model X to produce result Y >> 99% of the time) or in comparing models so I can tell a customer that model X “is better than” model Y at task Z. Post back if you find reasoning comparison data anywhere please!

Topic		Replies	Views
Evaluating AI Agents - thoughts on this flow? Community gpt-4	0	3378	August 2, 2023
Model Sliding: A Logical Approach to AI Model Selection Community gpt-4 , chatgpt , api	7	1392	July 12, 2023
Idea of context for GPT 3 API API	15	3653	December 15, 2023
What kind of academic resources has GPT-3 been trained on? API	12	1403	April 20, 2024
ChatGPT gives much better response than API GPT 3.5 Turbo API	4	2372	May 16, 2023

GPT-5 Reasoning Effort Impact on Agent Performance

Related topics