Hard usage limits with no visibility are breaking agent workflows (Codex + ChatGPT subscription)

Title: Hard usage limits + no visibility are breaking agent workflows (Codex + ChatGPT subscription)

I’m running an agent-based workflow (Hermes) using ChatGPT OAuth with the Codex backend, and I’m consistently hitting a major usability issue: hard usage limits with zero visibility or warning.

This isn’t about wanting more quota — it’s about not being able to plan or complete work reliably.

Here’s what’s happening in practice:

  • The agent hits:
    HTTP 429: usage_limit_reached
    plan_type: plus
    resets_in_seconds: 8500–15000+

  • This results in full lockouts of 2–4+ hours.

  • There is no warning beforehand, no indication I’m close to a limit, and no way to estimate whether a task will complete.

  • When the limit is hit:

    • The system stops immediately (hard stop)

    • I cannot send a “pause”, “stop”, or “summarize state” command

    • The agent cannot checkpoint progress

    • The job is effectively lost

This is especially problematic for agents because:

  • One user task may generate many internal calls (planning, tool use, retries)

  • Usage is consumed much faster than expected

  • There is no visibility into how much is being consumed per task

I’ve also seen inconsistent model usage:

  • Even when configured for gpt-5.4-mini, logs show requests hitting gpt-5.4

  • There is no transparency into fallback behavior

  • This can spike usage unexpectedly and trigger lockouts

What would solve this:

  1. Usage transparency
  • A visible usage meter or remaining capacity estimate

  • A warning before hitting limits

  • A pre-flight check: “this task may exceed remaining usage”

  1. Graceful limit handling
  • Let in-progress tasks finish, OR

  • Provide a wind-down buffer instead of a hard stop

  • Allow at least one final request (e.g., summarize progress)

  1. Dynamic model routing
  • Allow agents to use multiple available models intelligently

  • Use stronger models for reasoning, lighter models for simple steps

  • Automatically downgrade when nearing limits

  • Prevent silent fallback to heavier models

  1. Agent-aware controls
  • Task size estimation (small / medium / large)

  • Checkpointing during long tasks

  • Ability to block tasks that cannot complete within remaining usage

Right now, the system behaves like a black box with a hard cutoff. That makes agent workflows unreliable because work can be interrupted with no warning and no recovery path.

Again — not asking for unlimited usage. Just enough transparency and control to use the system responsibly.

Would love to hear if others are running into the same issue or have found workarounds.

Not trying to sound like jerk, but this is in no way a Codex or OpenAI issue. They provide usage information in their own tools CLI and desktop, and on the website. It’s not their job to cater to every agentic tool on the market. Ontop of that, you’re probably going to have a rough time trying to run Hermes on a plus plan anyways. That being said, it’s easy to either create your own usage tracking tools/dashboards or find one of the many open source options and try that. Or just ask Hermes to create one for you.

I get what you’re saying, and I agree that OpenAI doesn’t need to support every third-party agent directly.

But the issue I’m running into isn’t really Hermes-specific — it’s how the limits behave in the Codex + ChatGPT subscription environment itself.

The main problem is the combination of:

  • no visibility into remaining usage

  • no warning before hitting the limit

  • and a hard stop that prevents even a final “summarize state” or “pause” command

Once the limit is hit, the system is completely blocked, which makes it hard to use responsibly in any workflow that involves multiple steps or longer tasks.

Even if I built my own tracker, I wouldn’t have access to:

  • actual remaining capacity

  • how much each request consumes

  • or the hidden calls an agent might make

So it becomes guesswork rather than something you can reliably plan around.

I’m not really asking OpenAI to support Hermes specifically — more that the underlying system could provide:

  • a basic usage estimate or warning

  • and a way to wind down safely instead of a hard cutoff

That would make a big difference even for non-agent use cases, especially for anything more complex than single prompts.

You absolutely can make tools that do all of those things. I have a dashboard that actively tracks my usage in real-time, and then you can wire those tools up to your agents to make them aware of the usage, and adjust their workflows accordingly. I hear what you’re saying about it abruptly being cut off, but that’s just how usage limits work. The same applies for any other service provided, once the bill is up, or usage is gone, it just stops, rarely will any company cater to you just because you are in the middle of working on something. That’s why it’s important to make yourself and/or your agents aware of current usage and track it, and for me a custom built dashboard was the best way do that. It’s not a hard task for these agents to build.