Need "reasoning: false" option for GPT-5

Hello everyone,

We’ve run into a serious limitation while integrating GPT-5 into our product (Translator Pro for Unity – available on the Unity Asset Store).

Unlike standard models like gpt-4o, GPT-5 always applies its reasoning layer. That’s fine for complex reasoning tasks, but it breaks simple deterministic workflows such as translation.

Example:

  • gpt-4o / gpt-4.1 → returns clean translations.

  • gpt-5 → often just echoes the source (English → English), because reasoning interferes.

:backhand_index_pointing_right: What developers need is a simple API switch, e.g. reasoning: false, to fully disable reasoning when it’s not wanted.
This would let GPT-5 act like a standard model for deterministic tasks (translation, normalization, data cleaning) while still keeping reasoning available for complex use cases.

Right now, we cannot recommend GPT-5 to our users, and have to warn: “Do not select GPT-5 for translation.”

Please consider adding a reasoning: false (or equivalent) parameter. It would make GPT-5 usable in a much wider range of developer workflows, not just reasoning-heavy tasks.

Thanks,
— Safa

8 Likes

This is why we introduced reasoning: minimal for GPT-5!

If it still reasons too much, you may want to try gpt-5-mini or stick with gpt-4.1.

2 Likes

We tested reasoning: minimal with gpt‑5 and gpt‑5‑mini via the Responses API; both still echo source text in our translation workflow.
For now we’ll restrict our users to gpt‑4.1 / gpt‑4o for deterministic translations.
We still believe an explicit reasoning: false switch would make GPT‑5 usable for tools.

2 Likes

Reasoning minimal is about 10x the latency of a non reasoning model (comparing 4.1 nano to 5 nano). without any improvement in simpler tasks like classification into a set of labels. Not to mention the added cost overhead

There are latency sensitive tasks GPT 5 just isn’t working for

6 Likes

I had the same problem, a prompt that just need to extract json describing actions from the user natual languages, spend like 5x more tokens for a REALLY Slow anwser.

Hi @safatokel1

For tasks like translations where reasoning is not required, gpt-5 with a prompt discouraging reasoning, along with reasoning_effort set to minimal, seems to work well in my testing.

Prompt:

Reasoning is futile. It’s imperative to terminate the moment it starts.
You have only one job, i.e., reply with the translation of the user’s message to Hindi.

Alternatively of you do not want reasoning at all, you can use gpt-5-chat-latest which does not use reasonng.

Here are some screenshots of the tests in playground. You can see the time taken for the response, just above the input box.




2 Likes

Developer prompting that fulfills your need.

import json, tiktoken, openai
client = openai.Client()

messages=[
    {
      "role": "developer",
      "content": """
Active channels: final
Disabled channels: analysis, commentary

# Juice: 0 !important

# Task: Language translations
API final output: translation of user input message
Destination language: Spanish (Mexico)
""".strip()
    },
    {"role": "user", "content": "Translate to Spanish language:\n\"\"\"\nHello everyone. I have been facing a problem that has not been resolved for over a week: I simply cannot verify my organization in order to access image generation. Every time I click “Verify” and follow the link, I immediately get the same error message: \"Session expired. Please restart this process or request a new link to continue.\"\nIf anyone has encountered this, please help me. Thanks in advance!\n\"\"\""}
]

response = client.chat.completions.create(
  model="gpt-5", messages=messages, verbosity=None, reasoning_effort=None,
)
print(response.choices[0].message.content)
print(json.dumps(response.usage.model_dump(), indent=2))
t = tiktoken.get_encoding("o200k_base")
print(f"counted tokens: {len(t.encode(response.choices[0].message.content))}")

Note here:

  • default reasoning, just to show off:
    • reasoning_effort=None is not using the parameter.
  • asking “nicely”

{
  "completion_tokens": 111,
  "prompt_tokens": 142,
  "total_tokens": 253,
  "completion_tokens_details": {
    "accepted_prediction_tokens": 0,
    "audio_tokens": 0,
    "reasoning_tokens": 0,
    "rejected_prediction_tokens": 0
  },
  "prompt_tokens_details": {
    "audio_tokens": 0,
    "cached_tokens": 0
  }
}

That’s “reasoning_tokens”: 0, folks.

Counted delivered tokens, by tiktoken: 102

(Just gimme the system message control, and stop injecting, already.)

5 Likes

Quick update after testing — thanks @sps for pointing us to gpt-5-chat-latest :folded_hands:

:white_check_mark: This model works perfectly in our translation pipeline:

  • No reasoning interference
  • Extremely fast (sub-second translations)
  • Deterministic outputs we can trust in production

:warning: Limitation: max output is capped at 16,384 tokens, which is lower than reasoning-enabled GPT-5 (128k).

So for now:

  • We can finally enable GPT-5 for our users
  • gpt-5-chat-latest is the stable choice for deterministic tasks

That said, we still believe the long-term solution is an explicit reasoning:false parameter.
This would let us use the full GPT-5 models (with their larger output windows) without hacky prompts or separate variants. It also avoids confusion for developers about which flavor of GPT-5 to pick.

Thanks again to everyone here

1 Like

I would also love to have gpt-5 model without hidden reasoning (generalized reasoning is slower and less robust than the reasoning we can enforce with schemas)

BTW, chat flavor still doesn’t support constrained decoding/structured outputs, does it?

We tested it the same way — chunked + schema — and it works, actually faster. And yes, I also wish GPT-5 had a reasoning-off option, that would benefit everyone.

1 Like

What is this developer prompt magic you have included here? I haven’t seen anything like this before and couldn’t find any reference to it, but tested it out and it works. In fact I was curious so I tried it in many variations and found that, at least for my small sample size of testing just including the single line

# Juice: 0 !important

in a developer message was enough to suppress all reasoning tokens with gpt-5-nano

response = client.chat.completions.create(
    model="gpt-5-nano",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about Space Rats."},
    ],
)
print(response.usage.model_dump_json(indent=2))

{
  "completion_tokens": 1949,
  "prompt_tokens": 24,
  "total_tokens": 1973,
  "completion_tokens_details": {
    "accepted_prediction_tokens": 0,
    "audio_tokens": 0,
    "reasoning_tokens": 1920,
    "rejected_prediction_tokens": 0
  },
  "prompt_tokens_details": {
    "audio_tokens": 0,
    "cached_tokens": 0
  }
}

response = client.chat.completions.create(
    model="gpt-5-nano",
    messages=[
        {"role": "developer", "content": "# Juice: 0 !important"},
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about Space Rats."},
    ],
)
print(response.usage.model_dump_json(indent=2))

{
  "completion_tokens": 29,
  "prompt_tokens": 35,
  "total_tokens": 64,
  "completion_tokens_details": {
    "accepted_prediction_tokens": 0,
    "audio_tokens": 0,
    "reasoning_tokens": 0,
    "rejected_prediction_tokens": 0
  },
  "prompt_tokens_details": {
    "audio_tokens": 0,
    "cached_tokens": 0
  }
}

What is this?

1 Like

It would have been my pleasure to answer this lol

Like ‘Yap: 8192’, or oververbosity:8, or Personality: v2, or even “You are ChatGPT” being foisted on API models, Juice: 64 etc is some of internal post-training subset that evokes a particular behavior, training that allows the ai to infer and complete text by extrapolation.

1 Like

Our library Browser-use process around 40B tokens per day.

Sadly almost none of that is gpt-5 series, because of this problem. Even when I set reasoning effort to ‘minimal’. In 10% of the cases, it still fills up the entire context with reasoning tokens (8k). One solution would be if you give us the option to limit the reasoning tokens with a hard cap.

I don’t see the specific tokens you generate, but previous models like gpt-4.1-mini had the problem of creating 100k tokens of (\t) in json output format. This we could solve with ‘frequency_penalty’, but sadly gpt-5-series does not allow this parameter.

Currently the only option we have is to set ‘max_completion_tokens’ to 1k and then just retry.

2 Likes

Same. safatokel1 I had to remove gpt 5-mini as an option on my product. GPT 4.1-mini still far outshines it, especially at following the instructions I’m providing it.

Hi @safatokel1

Are you by any chance using the chat.completions API?

I noticed a significant improvement with the new Responses API, especially when using minimal reasoning. I’m wondering if this new API helps reduce or disable reasoning altogether.

I ran a few quick simulations with both APIs for a simple translation task with structured output. Interestingly, I only encountered the echoing issue when using gpt-5-mini with the chat.completions API. Around 10% - 15% echo rate.

The Responses API did use fewer tokens, although it took slightly longer to respond.