How can we clearly see token usage per Codex conversation/task, especially in Codex web?

Hi OpenAI team and community,

I’m using Codex quite actively for software development workflows, especially with GitHub issues, repo analysis, implementation tasks, testing, and deployment preparation.

I can currently see the remaining percentage/credits of my Codex plan, but I cannot find a clear way in the Codex web/app interface to see how many tokens were consumed by a specific Codex conversation, task, or run.

From the documentation, I understand that Codex usage is now based on token usage: input tokens, cached input tokens, and output tokens. I also see that Codex Settings → Usage shows remaining credit/limits, and that in Codex CLI the /status command can show current token usage during a session.

However, what I would like to know is:

  1. Is there any way in Codex web/app to see token usage for a specific conversation, task, or run?

  2. Can I see how much of the context window is being used in a current Codex session?

  3. Can I see whether a task is becoming expensive because Codex has read too many files, kept too much previous conversation context, or generated too much output?

  4. Is there a recommended workflow to decide when to continue in the same Codex conversation versus starting a fresh one?

  5. Is there any log, export, or debug view that shows:

    • input tokens,

    • cached input tokens,

    • output tokens,

    • total credits consumed,

    • files read,

    • tool calls,

    • and context size per task?

The reason this matters is that many of us are trying to build disciplined agentic development workflows. For example, I’m trying to use one Codex conversation per GitHub issue, with small verifiable tasks, tests, and evidence comments. But without clear per-task token visibility, it’s difficult to know whether a conversation is becoming inefficient or whether the prompt/context should be reduced.

A useful improvement would be something like a “Task usage breakdown” panel in Codex showing:


Task / conversation usage:
- Input tokens
- Cached input tokens
- Output tokens
- Estimated credits consumed
- Context window used
- Files read
- Largest context contributors
- Recommendation: continue / start fresh / summarize context

Even a rough estimate would be very helpful. The current remaining-credit percentage is useful, but it does not help me understand which conversations or workflows are consuming the most tokens.

Is this currently possible somewhere, or is it only available through CLI/API/enterprise analytics?

Thanks.

First off I am not an OpenAI employee.

I used ChatGPT and gave it some information and while I knew it would not be able to break a Codex sesssion or usage back to eact token count, it could give you a clearer picture than tokens were here now they are gone.

Here is part of the prompt used

read: https://help.openai.com/en/articles/20001106-codex-rate-card
read: https://community.openai.com/t/codex-rate-limits-discussion-thread/1378553/370
read: https://openai.com/index/unrolling-the-codex-agent-loop/
read: https://developers.openai.com/codex/pricing#what-are-the-usage-limits-for-my-plan 

The .codex directory should hold the details that can be used with the read documents to stitch together cost breakdowns.

You can add to the prompt but hopefully this will allow you to explore more on your own.

HTH


Note: I use Codex with VS Code via the Codex extension which creates and updates the .codex directory.

Most of your questions can be answered by inspecting and studying the session details inside the codex home folder ~/.codex/sessions.In this folder you have details about each of your threads. If you don’t feel like manually exploring these files, you can ask an agent to do it for you.

About your 4th question: In my case, I use to start new threads when the next context that I’ll work is in another direction. It is also good to keep things organized. Also, one of the reasons that sub-agents is useful is to prevent context rot issues.

Actually you can develop a custom ui dashboard to render data from the sessions files inside that folder. I am doing it, but it’s still in progress.

It is possible to peek the local cache for token usage. Basically, there is a cache folder where codex sessions are stored. It is either under your user folder on win/mac or in the user folder under WSL if you are using it.

Here is a small python snippet to help you get started, that will list the latest 5 thread IDs and the usage progression for the latest thread ID (or to peek an specific ID):

#!/usr/bin/env python3
import json, os, re, sys
from datetime import datetime
from pathlib import Path

thread_id = "" #inform a thread id to peek specific usage
UUID = re.compile(r"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}", re.I)


def session_root():
    return Path(os.environ.get("CODEX_HOME", Path.home() / ".codex")) / "sessions"


def read_jsonl(path):
    with path.open(encoding="utf-8") as f:
        for line in f:
            try:
                yield json.loads(line)
            except json.JSONDecodeError:
                pass


def id_from_file(path):
    for obj in read_jsonl(path):
        if isinstance(obj.get("id"), str):
            return obj["id"]
        payload = obj.get("payload")
        if obj.get("type") == "session_meta" and isinstance(payload, dict) and isinstance(payload.get("id"), str):
            return payload["id"]
    match = UUID.search(path.stem)
    return match.group(0) if match else path.stem


def usage_item(obj):
    payload = obj.get("payload")
    return obj.get("type") == "event_msg" and isinstance(payload, dict) and payload.get("type") == "token_count"


def main():
    root = session_root()
    files = sorted(root.rglob("*.jsonl"), key=lambda p: p.stat().st_mtime, reverse=True)
    threads = [(id_from_file(p), p) for p in files]
    target = thread_id.strip()

    if not target:
        print("latest 5 threads:", file=sys.stderr)
        for tid, path in threads[:5]:
            ts = datetime.fromtimestamp(path.stat().st_mtime).isoformat(timespec="seconds")
            print(f"{ts}  {tid}  {path}", file=sys.stderr)
        target = threads[0][0] if threads else None
    if not target:
        raise SystemExit(f"no .jsonl sessions found under {root}")
    matches = [p for tid, p in threads if tid == target or tid.startswith(target) or p.stem.startswith(target)]
    if not matches:
        raise SystemExit(f"thread not found: {target}")

    print(f"\nusage items for {id_from_file(matches[0])}:", file=sys.stderr)
    for obj in read_jsonl(matches[0]):
        if usage_item(obj):
            print(json.dumps(obj, indent=2, ensure_ascii=False))


if __name__ == "__main__":
    main()

It looks similar to this:

{
  "timestamp": "2026-01-01T00:00:00.000Z",
  "type": "event_msg",
  "payload": {
    "type": "token_count",
    "info": {
      "total_token_usage": {
        "input_tokens": 120000,
        "cached_input_tokens": 100000,
        "output_tokens": 2500,
        "reasoning_output_tokens": 800,
        "total_tokens": 122500
      },
      "last_token_usage": {
        "input_tokens": 15000,
        "cached_input_tokens": 14000,
        "output_tokens": 150,
        "reasoning_output_tokens": 30,
        "total_tokens": 15150
      },
      "model_context_window": 200000
    },
    "rate_limits": {
      "limit_id": "example_limit_id",
      "limit_name": null,
      "primary": {
        "used_percent": 0.0,
        "window_minutes": 300,
        "resets_at": 1700000000
      },
      "secondary": {
        "used_percent": 1.0,
        "window_minutes": 10080,
        "resets_at": 1700600000
      },
      "credits": null,
      "plan_type": "example_plan",
      "rate_limit_reached_type": null
    }
  }
}

Have you considered using hooks?

Yes, I am trying to get you to do the work while I just drop suggestions. :slightly_smiling_face: