Understanding GPT-5 mini pricing confusion (total output tokens missing bug)

The power of AI means you don’t have to make mistakes in your math, you let the AI do that for you.

Screenshot costs

Model Input Cost Cached Input Cost Output Cost
gpt-5-2025-08-07 $0.005 $0.198
gpt-5-mini-2025-08-07 $0.579 $0.002 $8.892
gpt-5-nano-2025-08-07 $0 $0.013

Report: Reconciling “Total tokens” vs. billed usage

What we computed from the billed $ amounts

Using your prices-per-million and the line items you shared, we re-derived tokens, limited by dollar figure accuracy (and verified in Python):

Model Input Cached input Output Model total
gpt-5 4,000 19,800 23,800
gpt-5-mini 2,316,000 80,000 4,446,000 6,842,000
gpt-5-nano 0 32,500 32,500
Total 6,898,300

Breakdown:

  • All inputs (incl. cached): 2,400,000
  • All outputs: 4,498,300
  • Grand total (input+output): 6,898,300

What the UI shows

  • “Total tokens” (screenshot): 2,408,624
  • Total requests: 451 (≈ 5,341 input tokens/request)

Reconciliation & findings

  1. The UI’s “Total tokens” appears to be input-only

    • Sum of billed input tokens we derived = 2,400,000.
    • UI shows 2,408,624+8,624 tokens above the billed inputs.
    • If the UI were counting input + output, it should be near 6.9M, not 2.4M.
      Conclusion: That widget is best interpreted as prompt/input tokens (including cached input), not total input+output.
  2. Why UI input (2,408,624) is slightly higher than billed inputs (2,400,000)
    Two effects can explain the +8,624 delta, and both are consistent with your notes:

    • Free-tier input tokens are counted in the “Total tokens” widget but not in the cost line items until you overflow.

      • The +8,624 looks like a small amount of free input tokens (likely mini/nano and/or GPT input) that were consumed but not billed. (note: the AI doesn’t understand that these would all come out first if you were enrolled in data sharing for daily free tokens)
    • Rounding of displayed $ amounts (shown to $0.001 precision) can alone account for a difference of this size.

      • Example sensitivities of a $0.001 change → token error:

        • gpt-5 input ($1.25/M): ±800 tokens per $0.001 (±400 with half-cent rounding)
        • gpt-5-mini input ($0.25/M): ±4,000 tokens per $0.001 (±2,000 rounding)
        • gpt-5-mini cached input ($0.025/M): ±40,000 tokens per $0.001 (±20,000 rounding)
        • gpt-5-nano input ($0.05/M): ±20,000 tokens per $0.001 (displayed $0.000 could still hide up to ~8,000 tokens)
      • Given these bounds, 8,624 is well within expected rounding noise if dollars are rounded before display.

  3. No evidence of output tokens in that UI number

    • Our derived output tokens alone are 4,498,300, which the widget clearly does not reflect.

Bottom line

  • Inferred tokens used (from billing):

    • Inputs (incl. cached): 2,400,000
    • Outputs: 4,498,300
    • Total: 6,898,300
  • UI “Total tokens”: 2,408,624

  • Does it disagree?
    Not once we interpret the widget as input-only tokens. The remaining +8,624 is plausibly explained by free input tokens that didn’t show in costs and/or rounding of the displayed $ amounts. Under either explanation, the small delta is reconcilable; there’s no sign of a billing inaccuracy in the data you provided.

If you can export the underlying (unrounded) dollar amounts or a per-direction token breakdown from the UI, we can pin down exactly how much of the +8,624 is free-tier vs rounding.

1 Like