Completion tokens for the o1 models consist of both the reasoning tokens and the tokens for the actual / visible response. You are charged for both.
It is also addressed here in the documentation as well as disclosed on the OpenAI pricing page.
Based on my own tests, predominantly with o1-preview, I can confirm that reasoning tokens always tend to be significantly higher than the number of tokens of the actual response. Your example is consistent with this.
Thanks @jr.2509
It actually make sense to pay for internal prompts as well. It can be a significant extra cost for OpenAI and it’s not difficult to write a prompt to generate little output but requires a lot of processing power.
It shouldn’t be a fine print though. It should be written in HUGE fonts