When I call a chat.completions.parse API, it returns
completion.usage.total_tokens
completion.usage.prompt_tokens_details.cached_tokens
I understand that the cached_tokens is because of prompt caching, but will OpenAI charge for these cached_tokens? How can I calculate correct cost based on returned usage info?
for example the first call return
total_tokens = 4096
cached_tokens = 0
the second call returns
total_tokens = 8192
cached_tokens = 4096
Hi! I understand how the API return for usage can be a bit confusing, with the new information fields accompanying the preexisting prompt and completion token values.
Made more presentable, the API usage information is reported as:
If the context caching system is able to use some of its precomputed state on later similar calls, then that cached token count will appear in the API’s cached_token field.
That is the number of prompt tokens that will be discounted 50%.
Since OpenAI also went through the work of explicitly stating the cached token price on models, we could give the benefit of doubt that the percentage might decrease, and use that dollar cost as formula input.