Hard usage limits with no visibility are breaking agent workflows (Codex + ChatGPT subscription)

PhantomStack · April 6, 2026, 3:05pm

Title: Hard usage limits + no visibility are breaking agent workflows (Codex + ChatGPT subscription)

I’m running an agent-based workflow (Hermes) using ChatGPT OAuth with the Codex backend, and I’m consistently hitting a major usability issue: hard usage limits with zero visibility or warning.

This isn’t about wanting more quota — it’s about not being able to plan or complete work reliably.

Here’s what’s happening in practice:

The agent hits:
HTTP 429: usage_limit_reached
plan_type: plus
resets_in_seconds: 8500–15000+
This results in full lockouts of 2–4+ hours.
There is no warning beforehand, no indication I’m close to a limit, and no way to estimate whether a task will complete.
When the limit is hit:
- The system stops immediately (hard stop)
- I cannot send a “pause”, “stop”, or “summarize state” command
- The agent cannot checkpoint progress
- The job is effectively lost

This is especially problematic for agents because:

One user task may generate many internal calls (planning, tool use, retries)
Usage is consumed much faster than expected
There is no visibility into how much is being consumed per task

I’ve also seen inconsistent model usage:

Even when configured for gpt-5.4-mini, logs show requests hitting gpt-5.4
There is no transparency into fallback behavior
This can spike usage unexpectedly and trigger lockouts

What would solve this:

Usage transparency

A visible usage meter or remaining capacity estimate
A warning before hitting limits
A pre-flight check: “this task may exceed remaining usage”

Graceful limit handling

Let in-progress tasks finish, OR
Provide a wind-down buffer instead of a hard stop
Allow at least one final request (e.g., summarize progress)

Dynamic model routing

Allow agents to use multiple available models intelligently
Use stronger models for reasoning, lighter models for simple steps
Automatically downgrade when nearing limits
Prevent silent fallback to heavier models

Agent-aware controls

Task size estimation (small / medium / large)
Checkpointing during long tasks
Ability to block tasks that cannot complete within remaining usage

Right now, the system behaves like a black box with a hard cutoff. That makes agent workflows unreliable because work can be interrupted with no warning and no recovery path.

Again — not asking for unlimited usage. Just enough transparency and control to use the system responsibly.

Would love to hear if others are running into the same issue or have found workarounds.

Strife_Tech · April 6, 2026, 4:43pm

Not trying to sound like jerk, but this is in no way a Codex or OpenAI issue. They provide usage information in their own tools CLI and desktop, and on the website. It’s not their job to cater to every agentic tool on the market. Ontop of that, you’re probably going to have a rough time trying to run Hermes on a plus plan anyways. That being said, it’s easy to either create your own usage tracking tools/dashboards or find one of the many open source options and try that. Or just ask Hermes to create one for you.

PhantomStack · April 6, 2026, 4:53pm

I get what you’re saying, and I agree that OpenAI doesn’t need to support every third-party agent directly.

But the issue I’m running into isn’t really Hermes-specific — it’s how the limits behave in the Codex + ChatGPT subscription environment itself.

The main problem is the combination of:

no visibility into remaining usage
no warning before hitting the limit
and a hard stop that prevents even a final “summarize state” or “pause” command

Once the limit is hit, the system is completely blocked, which makes it hard to use responsibly in any workflow that involves multiple steps or longer tasks.

Even if I built my own tracker, I wouldn’t have access to:

actual remaining capacity
how much each request consumes
or the hidden calls an agent might make

So it becomes guesswork rather than something you can reliably plan around.

I’m not really asking OpenAI to support Hermes specifically — more that the underlying system could provide:

a basic usage estimate or warning
and a way to wind down safely instead of a hard cutoff

That would make a big difference even for non-agent use cases, especially for anything more complex than single prompts.

Strife_Tech · April 6, 2026, 6:33pm

You absolutely can make tools that do all of those things. I have a dashboard that actively tracks my usage in real-time, and then you can wire those tools up to your agents to make them aware of the usage, and adjust their workflows accordingly. I hear what you’re saying about it abruptly being cut off, but that’s just how usage limits work. The same applies for any other service provided, once the bill is up, or usage is gone, it just stops, rarely will any company cater to you just because you are in the middle of working on something. That’s why it’s important to make yourself and/or your agents aware of current usage and track it, and for me a custom built dashboard was the best way do that. It’s not a hard task for these agents to build.

Topic		Replies	Views
Codex Usage Forecasting & Transparency for Pro/Plus Plans – Clarification Needed Codex CLI codex , codex-cli , codex-cloud	5	739	January 8, 2026
Usage Limits of Codex within Cursor Codex codex , gpt-5	15	14217	January 23, 2026
Codex weekly limit (VS Code ext) Codex codex	18	4209	November 14, 2025
Codex Credits Dissappearing Codex	31	1628	April 4, 2026
Request Cap makes impossible to build GPTs GPT builders gpt-4 , chatgpt , plugin-development , chatgpt-plugin	11	3639	July 19, 2024

Hard usage limits with no visibility are breaking agent workflows (Codex + ChatGPT subscription)

Related topics