GPT-5.4 deep dive: pricing, context limits, and tool search explained

The announcement covers the headlines well. This is the companion post covering the specifics you’ll want to know before your first API call, so you’re not figuring them out mid-build.


:bulb: The 1M context window is opt-in (and worth understanding first)

The 1M token context is real, but it’s an experimental feature you enable explicitly by configuring model_context_window and model_auto_compact_token_limit.

Without those params, you’re on the standard 272K window. Requests that go beyond 272K count against usage limits at 2x the normal rate. Hence it’s worth sizing your workloads intentionally before enabling it.

For most tasks, 272K is plenty. The 1M path is there when you genuinely need it.


:moneybag: Pricing — and why token efficiency changes the math

Model Input Cached Input Output
gpt-5.2 $1.75 / M $0.175 / M $14 / M
gpt-5.4 $2.50 / M $0.25 / M $15 / M
gpt-5.2-pro $21 / M $168 / M
gpt-5.4-pro $30 / M $180 / M

The per-token price is higher, but GPT-5.4 is meaningfully more token-efficient — in OpenAI’s MCP Atlas tests, tool search alone reduced total token usage by 47%.

Before assuming it’s a cost increase, it’s worth running a comparison on your actual workload. For tool-heavy agents especially, the net cost may surprise you.

Batch and Flex processing remain available at half the standard rate. Priority processing (the API equivalent of /fast mode in Codex) is available at 2x.


:mag: Tool Search: how to set it up?

Tool search is one of the most impactful new capabilities for agent builders, and it requires explicit setup rather than being on by default.

Instead of loading all tool definitions into the prompt upfront, the model receives a lightweight tool list and fetches definitions on demand. The result: smaller prompts, preserved cache, and the ability to work across much larger tool ecosystems.

The 47% token reduction number comes from running all 36 MCP servers in Scale’s MCP Atlas benchmark. For MCP-heavy setups, the efficiency gains are substantial. See the tool search guide for setup details.


:desktop_computer: Computer use: get your image detail level right

First of all:

GPT‑5.4 is our first general-purpose model with native computer-use capabilities

OpenAI introduced a new original image input detail level alongside changes to high:

  • original: up to 10.24M pixels, 6000px max dimension — new
  • high (updated): up to 2.56M pixels, 2048px max dimension

Early testing showed strong gains in localization and click accuracy when using original or high. If you’re building computer use agents, setting the right detail level upfront will meaningfully affect your results — it’s one of the first things to dial in.


:brain: Reasoning effort levels and what the benchmarks actually reflect

Most benchmark numbers in the announcement were measured at reasoning_effort=xhigh. Performance at none looks different — though GPT-5.4 at none still outperforms GPT-5.2 on latency-sensitive tasks like τ²-bench Telecom (64.3% vs 57.2%).

For production workloads, it’s worth benchmarking at the reasoning effort you’ll actually use rather than defaulting to xhigh everywhere. The model is efficient enough at lower effort levels that you may not need to reach for xhigh as often as with previous models.


:high_voltage: Fast mode

When toggled on, /fast mode in Codex delivers up to 1.5x faster token velocity with GPT‑5.4. It’s the same model and the same intelligence, just faster. That means users can move through coding tasks, iteration, and debugging while staying in flow. Developers can access GPT‑5.4 at the same fast speeds via the API by using priority processing⁠.


:calendar: ChatGPT plan rollout details

  • Plus, Team, Pro: GPT-5.4 Thinking available today, replaces GPT-5.2 Thinking
  • Enterprise and Edu: early access via admin settings — needs to be manually enabled
  • GPT-5.2 Thinking: moves to Legacy Models, retires June 5, 2026
  • GPT-5.4 Pro: available on Pro and Enterprise plans

Context windows in ChatGPT for GPT-5.4 Thinking are unchanged from GPT-5.2 Thinking.


The benchmark worth bookmarking

OSWorld-Verified tests desktop navigation via screenshots + mouse/keyboard actions:

GPT-5.4:  75.0%
Human:    72.4%
GPT-5.2:  47.3%

That jump from 47% to 75% - past human-level, is a meaningful signal for anyone building computer use agents. It’s the number that most changes what’s now reasonable to attempt.


Curious what workloads folks here are planning to test first. To me tool search and computer use feel like the areas with the most unexplored surface right now.

8 Likes

Very interesting - I’m curious to see how this works out.

And I see they went to the trouble of including a local tool search option - very nice!

4 Likes

I just tried this on a recently updated codex-cli with no luck - is this just an app thing?

And what’s the downside - reach rate limits quicker - why wouldn’t everyone just switch this on?

update: Oh lack of /fast could simply be latency in the package release schedule.

Searching the repo reveals the option …

1 Like

How about: When it breaks, you can switch it off.

If “API” gets the same thing at 2x the price by setting priority … does it chow through your paid ChatGPT usage and then Codex credits twice as fast also??

1 Like

OK a manual npm install -g @openai/codex updates to the required version now (don’t wait for codex to prompt you) - /fast is then now an option vroooooom :racing_car:

4 Likes

/fast enables significantly faster inference at 3× the usage rate.

Think of it like switching Codex from deep reasoning mode to low latency mode.

Source

3 Likes

Yes you’d need to update to the newest version of codex-cli before you can use it there.

Step 1:

Update codex-cli

$ npm install -g @openai/codex@latest

Step 2:

Start codex and go to the model picker by typing /model and then select gpt-5.4

then select the reasoning effort:

NOTE: To enable fast mode simply type /fast in codex and you get the fastest inference at 2x plan usage.

2 Likes

As linked above by VB (aka velocity bandwidth)

Funny how everyone got slowed down like flipping a light switch right before “service_tier”:“priority” came to all on the API…with no discount.

1 Like

Datacentre manager had a meltdown … and after a few phone calls it became 2x (I’m guessing :wink: )

3 Likes

My app updated and I set ‘fast’ 5.4, but I don’t see /fast in my config.toml. I’m going to update the CLI to see if that makes it show up. Basically I can’t turn it off.

Just confirmed: it’s 2x the usage and 1.5x faster.

edited old post

Today I saw that additional token usage for fast mode is 2x. And then somewhere else it was mentioned that fast token usage is halved until early April.

So I honestly have no clue what the baseline is.

4 Likes

Here’s to add:

It’s back to parity with 3x the usage, ultimately - when you look at how much less usage in “estimated turns” you get when using a ChatGPT subscription for gpt-5.4 vs gpt-5.3-codex in codex:


This topic is tagged “API” so debate over /fast or its subscription consumption should be irrelevant. Now: does the codex software actually send “service_tier”:“priority” when using an API key?

2 Likes