Codex “5 hours” usage limit feels misleading and unacceptable for paid users

I consider the way Codex usage limits are presented to be misleading.

The product shows a “5 hours” usage limit/window, but in real work on a serious repository this allowance can be exhausted in about one hour, or after only a few tasks. For a normal customer, “5 hours” clearly sounds like available working time. In reality, it behaves more like a usage/compute/token budget that can disappear extremely fast depending on repository size, context size, model choice, task complexity, and token consumption.

This is not transparent enough.

If the limit is actually based on credits, compute, tokens, or internal usage cost, then the interface should not present it primarily as “hours” without a clear warning and a detailed breakdown of what consumes the allowance. Showing “5 hours” while the practical usable time can be around one hour feels misleading.

I am extremely disappointed with this experience.

There are different speed options, but from a user perspective the displayed “5 hours” still creates the expectation of usable working time. If a fast or top model can burn through the allowance after only a few tasks, the UI must make that obvious before the user pays. Otherwise, OpenAI could theoretically display any large number of “hours” while the real usable work time is much smaller. That is not a fair way to present limits to paying users.

In my case, Codex feels severely restricted for real project work. I paid expecting a practical coding assistant, but instead I get a system that can consume the whole allowance very quickly and then leaves me unable to continue meaningful work.

This is especially frustrating because the top model may be useful for large and complex tasks, but it can also consume the entire allowance too quickly. After that, the user is stuck. There should be a better fallback for paid users: for example, allow continued daily use with a lower model after the top-model allowance is exhausted, so users can still perform smaller routine tasks while waiting for the main limit to reset. Right now, the experience feels like I paid for “5 hours” of Codex, but in practice I may get only about an hour or a few serious runs before the system becomes unusable for real work.I am not satisfied with Codex in its current state. The limits are too restrictive, the usage calculation is not transparent enough, and the “hours” presentation creates expectations that do not match the actual experience.Please clarify:

  1. How exactly Codex usage is calculated.
  2. Why the limit is presented as “hours” if actual usable time can be much shorter.
  3. What specifically consumes the allowance so quickly.
  4. Whether OpenAI plans to make Codex limits more transparent.
  5. Whether paid users will get a lower-model fallback after the top-model allowance is exhausted.
  6. Whether users affected by this misleading presentation are eligible for credit or refund.

Codex could be a strong product, but the current limit presentation and real usage experience feel unacceptable for the price.

P.S. Another serious issue is that Codex appears to consume usage even while reading project/system context at the start of a session: repository instructions, project files, startup context, tool context, and other required information before the user even receives useful work.

If this startup/context-reading process consumes the same allowance, then the system is even less fair to paying users. A user should not lose a large part of the limit just because Codex needs to load and understand the project before doing the actual task.

If context size is a major factor, OpenAI should make this transparent and fair. Either limit the amount of context a user can provide before it becomes wasteful, clearly warn the user that the current project/context will consume a large part of the allowance, or exclude mandatory startup/project-reading overhead from the paid usage limit.

Right now it feels like Codex charges the user for everything: reading files, loading context, understanding instructions, choosing tools, and only then maybe doing the actual work. At this rate, it feels like even small interactions could eventually be counted against the limit. This is extremely frustrating and makes the product feel unpredictable and unfair.

The user needs a clear usage breakdown: how much was spent on reading context, how much on actual coding, how much on tool execution, and how much on model reasoning. Without that transparency, the “hours” limit is not meaningful.

Hi @Serdjio!

I added the feature-request tag to this topic because I understand how the current system can be confusing. There is a 5-hour time window, a number of tokens that can be spent within that window, and we only know after the fact how many of those tokens were used.

I am asking you to keep this constructive feedback contained to this topic. Not every report about Codex rate limits is related to this feature request.

The best path forward is to add your voice to the official Codex repo on GitHub. If other users agree, they can upvote it there, and the suggestion is more likely to become a priority.

Repeatedly sharing the same feature request across the forum will not help as much.

Thank you for your understanding.

Why would it?

The pricing page is quite clear, no?:

https://chatgpt.com/codex/pricing/

It contains this table:

It is a 5 hour window in which you can send a maximum number of messages, after which time it resets.

If you want to increase that ceiling you can either buy additional credits or increase your plan.

The token limit varies by account tier. The 5 hour limit stays the same.

This very similar to any SaaS.

It’s a well-trodden way of giving fair access so that systems are not overwhelmed and kept responsive whilst offering those that are willing to pay more a higher level of service.

How else should they manage it so it is both fair and protects the availability of the system?

Well, the obvious answer is “buy an expensive data plan:joy: 150 messages “I would write 150 words30 messages, and the tokens run out sooner. Thank you for your reply, I realized your forum is interesting. I’ve drawn my own conclusion. Good luck in your difficult work.:see_no_evil_monkey::hear_no_evil_monkey::speak_no_evil_monkey:

Yes that is the answer, I’m afraid.

It’s a professional tool. If you have a revenue stream from the software you create it is more justifiable to spend more.

We’ve seen tightening from Anthropic too and that has upset some. But what else can they do? They have to pay for the compute that is used.

Competition will help but everyone faces similar constraints on datacentre capacity.

You are absolutely right, and I understand that. A similar situation happened with Claude Code, but it seems they realized that limits need to be increased in order to retain customers.I also agree with you that for many freelancers without a startup behind them, planning a large-scale project can be difficult — that is a fact. I was not criticizing the idea as if, for $20, I expected to build a ChatGPT-level corporation with a single prompt like: “Make me a company like ChatGPT” :grin: Although, to be fair, it is not a bad idea — for $100, I might actually try it :grin:. What I was talking about is something else: the limits run out very quickly. And, as it turns out, I am not the only one experiencing this. Judging by the feedback, this seems to be happening across all pricing plans.

P.S. I simply wanted more transparency: to understand how the limits actually work and how to plan my work around the plan I’m paying for. Especially since my current plan is already, let’s be honest, not exactly the most expensive one — more like the budget option for broke freelancers like me :grin:

I would like the same, but I have learned to treat much of the process as a black box and make the best use of the information that is available.

I am sure I could add more references over time as I remember them, but this is the first one that comes to mind:

https://openai.com/api/pricing/

Notice that output tokens are much more expensive than input tokens, and that there is also a category for cached input.

My practical takeaway is to keep output as concise as possible. If you are asking for extensive documentation, examples, or background material that you are unlikely to read, it may be worth saying so in the prompt so the model does not generate unnecessary output.

As for cached input, my interpretation is that cache misses are more expensive to serve, so some of that cost is reflected in pricing. In practical terms, if you submit a prompt and then step away for several minutes while the system is waiting for your next reply, you may be more likely to lose the benefit of cached context.


Another useful area to understand is how thinking tokens work.

There is more than one paper and approach related to this, but one I remember off the top of my head is:

Training Large Language Models to Reason in a Continuous Latent Space
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian

https://arxiv.org/pdf/2412.06769

I remember this one mainly because the keyword Coconut makes it easy to search for later.


Another item for the list I just remembered.

Unrolling the Codex agent loop By Michael Bolin, Member of the (OpenAI) Technical Staff


A better page for the cost of tokens and noted in this reply,

https://help.openai.com/en/articles/20001106-codex-rate-card