Go to this particular usage page:
https://platform.openai.com/usage/chat-completions
While “chat completions”, it should also have calls made through Responses.
Be sure to clear any per-project filter at upper right next to the date range:

Then where you have “Input tokens” at top-left, it is completely non-obvious that you have a drop-down there, and additionally, that you need to check each type of bar graph that you wish to receive.
You’ll have to make good selections, otherwise it could show the same thing twice. For example, input cached and input uncached add together to make input tokens, and you’d see double the usage (silly). So just “input tokens” and “output tokens” (reasoning is also output tokens).
Then for the period selected, you get days to hover over (UTC 00:00), and also the total tokens of the period:

You can then pick “group by model”, and finally drill down to the token counts. Here, just showing the output tokens checked in the drop-down, grouped by model, a small date range, finally you get a per-day hover that starts to deliver your output consumption.
Then what is beyond frustrating: only that main bargraph will show output tokens. Regardless of your checked selections, the following usage graphs by models STILL only show input tokens.
Then you’ll have to apply the model costs yourself to that token count, and go across other surfaces in usage to make sure there are no alternate projects or endpoints separately reporting gpt-5 models. Tedious, probably on purpose.
So: Your billing is probably right, just needing correct interpretation, except for OpenAI:
- not delivering a gpt-5 cache discount when the calls are designed to be cacheable,
- billing for quality: high, for image input to gpt-5 when requesting “low”,
- the playground and the “prompt preset” billing you to get nothing output from gpt-5 because of prompts storing a bad
max_output_tokens
value of 2048 instead of “unlimited”, cutting off while still within reasoning.