Need "reasoning: false" option for GPT-5

safatokel1 · August 19, 2025, 6:10pm

Hello everyone,

We’ve run into a serious limitation while integrating GPT-5 into our product (Translator Pro for Unity – available on the Unity Asset Store).

Unlike standard models like gpt-4o, GPT-5 always applies its reasoning layer. That’s fine for complex reasoning tasks, but it breaks simple deterministic workflows such as translation.

Example:

gpt-4o / gpt-4.1 → returns clean translations.
gpt-5 → often just echoes the source (English → English), because reasoning interferes.

What developers need is a simple API switch, e.g. reasoning: false, to fully disable reasoning when it’s not wanted.
This would let GPT-5 act like a standard model for deterministic tasks (translation, normalization, data cleaning) while still keeping reasoning available for complex use cases.

Right now, we cannot recommend GPT-5 to our users, and have to warn: “Do not select GPT-5 for translation.”

Please consider adding a reasoning: false (or equivalent) parameter. It would make GPT-5 usable in a much wider range of developer workflows, not just reasoning-heavy tasks.

Thanks,
— Safa

edwinarbus · August 19, 2025, 6:16pm

This is why we introduced reasoning: minimal for GPT-5!

If it still reasons too much, you may want to try gpt-5-mini or stick with gpt-4.1.

safatokel1 · August 19, 2025, 7:24pm

We tested reasoning: minimal with gpt‑5 and gpt‑5‑mini via the Responses API; both still echo source text in our translation workflow.
For now we’ll restrict our users to gpt‑4.1 / gpt‑4o for deterministic translations.
We still believe an explicit reasoning: false switch would make GPT‑5 usable for tools.

softwaredoug · August 20, 2025, 7:46pm

Reasoning minimal is about 10x the latency of a non reasoning model (comparing 4.1 nano to 5 nano). without any improvement in simpler tasks like classification into a set of labels. Not to mention the added cost overhead

There are latency sensitive tasks GPT 5 just isn’t working for

victor.laureano · August 21, 2025, 5:03pm

I had the same problem, a prompt that just need to extract json describing actions from the user natual languages, spend like 5x more tokens for a REALLY Slow anwser.

sps · August 22, 2025, 4:16am

Hi @safatokel1

For tasks like translations where reasoning is not required, gpt-5 with a prompt discouraging reasoning, along with reasoning_effort set to minimal, seems to work well in my testing.

Prompt:

Reasoning is futile. It’s imperative to terminate the moment it starts.
You have only one job, i.e., reply with the translation of the user’s message to Hindi.

Alternatively of you do not want reasoning at all, you can use gpt-5-chat-latest which does not use reasonng.

Here are some screenshots of the tests in playground. You can see the time taken for the response, just above the input box.

_j · August 22, 2025, 5:05am

Developer prompting that fulfills your need.

import json, tiktoken, openai
client = openai.Client()

messages=[
    {
      "role": "developer",
      "content": """
Active channels: final
Disabled channels: analysis, commentary

# Juice: 0 !important

# Task: Language translations
API final output: translation of user input message
Destination language: Spanish (Mexico)
""".strip()
    },
    {"role": "user", "content": "Translate to Spanish language:\n\"\"\"\nHello everyone. I have been facing a problem that has not been resolved for over a week: I simply cannot verify my organization in order to access image generation. Every time I click “Verify” and follow the link, I immediately get the same error message: \"Session expired. Please restart this process or request a new link to continue.\"\nIf anyone has encountered this, please help me. Thanks in advance!\n\"\"\""}
]

response = client.chat.completions.create(
  model="gpt-5", messages=messages, verbosity=None, reasoning_effort=None,
)
print(response.choices[0].message.content)
print(json.dumps(response.usage.model_dump(), indent=2))
t = tiktoken.get_encoding("o200k_base")
print(f"counted tokens: {len(t.encode(response.choices[0].message.content))}")

Note here:

default reasoning, just to show off:
- reasoning_effort=None is not using the parameter.
asking “nicely”

{
  "completion_tokens": 111,
  "prompt_tokens": 142,
  "total_tokens": 253,
  "completion_tokens_details": {
    "accepted_prediction_tokens": 0,
    "audio_tokens": 0,
    "reasoning_tokens": 0,
    "rejected_prediction_tokens": 0
  },
  "prompt_tokens_details": {
    "audio_tokens": 0,
    "cached_tokens": 0
  }
}

That’s “reasoning_tokens”: 0, folks.

Counted delivered tokens, by tiktoken: 102

(Just gimme the system message control, and stop injecting, already.)

safatokel1 · August 23, 2025, 7:01am

Quick update after testing — thanks @sps for pointing us to gpt-5-chat-latest

This model works perfectly in our translation pipeline:

No reasoning interference
Extremely fast (sub-second translations)
Deterministic outputs we can trust in production

Limitation: max output is capped at 16,384 tokens, which is lower than reasoning-enabled GPT-5 (128k).

So for now:

We can finally enable GPT-5 for our users
gpt-5-chat-latest is the stable choice for deterministic tasks

That said, we still believe the long-term solution is an explicit reasoning:false parameter.
This would let us use the full GPT-5 models (with their larger output windows) without hacky prompts or separate variants. It also avoids confusion for developers about which flavor of GPT-5 to pick.

Thanks again to everyone here

abdullin · August 23, 2025, 9:39am

I would also love to have gpt-5 model without hidden reasoning (generalized reasoning is slower and less robust than the reasoning we can enforce with schemas)

BTW, chat flavor still doesn’t support constrained decoding/structured outputs, does it?

safatokel1 · August 23, 2025, 12:27pm

We tested it the same way — chunked + schema — and it works, actually faster. And yes, I also wish GPT-5 had a reasoning-off option, that would benefit everyone.

sully · August 25, 2025, 6:10pm

What is this developer prompt magic you have included here? I haven’t seen anything like this before and couldn’t find any reference to it, but tested it out and it works. In fact I was curious so I tried it in many variations and found that, at least for my small sample size of testing just including the single line

# Juice: 0 !important

in a developer message was enough to suppress all reasoning tokens with gpt-5-nano

response = client.chat.completions.create(
    model="gpt-5-nano",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about Space Rats."},
    ],
)
print(response.usage.model_dump_json(indent=2))

{
  "completion_tokens": 1949,
  "prompt_tokens": 24,
  "total_tokens": 1973,
  "completion_tokens_details": {
    "accepted_prediction_tokens": 0,
    "audio_tokens": 0,
    "reasoning_tokens": 1920,
    "rejected_prediction_tokens": 0
  },
  "prompt_tokens_details": {
    "audio_tokens": 0,
    "cached_tokens": 0
  }
}


response = client.chat.completions.create(
    model="gpt-5-nano",
    messages=[
        {"role": "developer", "content": "# Juice: 0 !important"},
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about Space Rats."},
    ],
)
print(response.usage.model_dump_json(indent=2))

{
  "completion_tokens": 29,
  "prompt_tokens": 35,
  "total_tokens": 64,
  "completion_tokens_details": {
    "accepted_prediction_tokens": 0,
    "audio_tokens": 0,
    "reasoning_tokens": 0,
    "rejected_prediction_tokens": 0
  },
  "prompt_tokens_details": {
    "audio_tokens": 0,
    "cached_tokens": 0
  }
}

What is this?

suge212 · August 25, 2025, 8:09pm

It would have been my pleasure to answer this lol

_j · August 26, 2025, 10:18pm

Like ‘Yap: 8192’, or oververbosity:8, or Personality: v2, or even “You are ChatGPT” being foisted on API models, Juice: 64 etc is some of internal post-training subset that evokes a particular behavior, training that allows the ai to infer and complete text by extrapolation.

gregor · September 12, 2025, 11:46pm

Our library Browser-use process around 40B tokens per day.

Sadly almost none of that is gpt-5 series, because of this problem. Even when I set reasoning effort to ‘minimal’. In 10% of the cases, it still fills up the entire context with reasoning tokens (8k). One solution would be if you give us the option to limit the reasoning tokens with a hard cap.

I don’t see the specific tokens you generate, but previous models like gpt-4.1-mini had the problem of creating 100k tokens of (\t) in json output format. This we could solve with ‘frequency_penalty’, but sadly gpt-5-series does not allow this parameter.

Currently the only option we have is to set ‘max_completion_tokens’ to 1k and then just retry.

heronfree · September 19, 2025, 8:50am

Same. safatokel1 I had to remove gpt 5-mini as an option on my product. GPT 4.1-mini still far outshines it, especially at following the instructions I’m providing it.

coolfire4 · October 29, 2025, 9:16am

Hi @safatokel1

Are you by any chance using the chat.completions API?

I noticed a significant improvement with the new Responses API, especially when using minimal reasoning. I’m wondering if this new API helps reduce or disable reasoning altogether.

I ran a few quick simulations with both APIs for a simple translation task with structured output. Interestingly, I only encountered the echoing issue when using gpt-5-mini with the chat.completions API. Around 10% - 15% echo rate.

The Responses API did use fewer tokens, although it took slightly longer to respond.

Topic		Replies	Views
Any way to force GPT5 not to think/use thinking tags? (API) API	5	4062	August 9, 2025
Can I set max_tokens for chatgpt turbo? API	23	29074	December 13, 2023
What is going on with the GPT-5 API? API	40	15398	October 21, 2025
Are there plans for min_tokens parameter? Prompting	6	2809	July 6, 2023
How to clip "bubble wrap" from the end of responses? Prompting	18	1414	March 22, 2023

Need "reasoning: false" option for GPT-5

Related topics