I’m building an agent on top of the Agents SDK with gpt-5 as my underlying model.
I’m wondering if anyone else is doing the same as has found any evidence of performance impacts from switching reasoning_effort
from default “medium” to “low” or even “minimal”? If not personal experience, have you seen any good research on this?
Our agent is a knowledge work assistant that is primarily used to gather and synthesis data from a bunch of sources. The agent has a handful of high level tools it uses to retrieve context.
I’ve been experimenting with the effects of changing the reasoning effort param from “medium” to “low”
The primary motivation for this is that when I look at my agent traces, very little of the latency is due to tool calls. Its almost all the model sitting thinking before or after tool calls.
The experiments have been interesting. Some findings so far:
- On high complexity tasks requiring lots of context gathering and synthesis, latency drops significantly when turning reasoning down. Quality does not seem to degrade much if at all.
- For less complex questions, latency does drop with lower reasoning, but not by a very impactful amount.
- Sure enough, low reasoning burns way less reasoning tokens even if the model takes the exact same tool calling path.
Based on these results I started to think, great, I can get away with low reasoning effort. However:
- Sometimes with low reasoning, the agent misses a critical step / tool call that ultimately cause it to produce an incorrect final response. An example: it does one batch of tool calls and gets partial information. On medium reasoning the model would do another batch and attempt to clarify ambiguities. On low it does not.
My experiments have been small in scale so far. If anyone else has insights here I’d love to hear them.