I’m using the GPT-4o-mini-TTS model, but since the response doesn’t return usage data, I have no way of knowing the number of input or output tokens.
For the moment, the speech API doesn’t return usage information.
But its costs are aproximately: 1k input characters ~= 1 minute ~= $0.015 (for english).
If you need exact usage, an alternative is using the gpt-4o-mini-audio-preview, which returns more detailed usage.
You can add some system prompt to make it behave like a TTS endpoint:
Echo the exact text sent to you in the user prompt, with no extra responses
Try it in the playground for more details.
And I forgot to mention, in the usage dashboard if you export csv data you can see the sumarized input tokens, but not for individual requests.
I’m not always able to access the usage dashboard, so it’s difficult for me to keep track of detailed usage information. I’ve reviewed the pricing and found the following:
- GPT-4o Mini TTS:
- $0.60 per 1M input tokens
- $12.00 per 1M output tokens, or $0.015 per minute
Given this, I’d like to confirm: how many input tokens am I actually sending?
instructions+input text (you can have a rough idea at tokenizer)
But the text tokens are almost negligible, you pay mostly for the generated audio in length.
I understand now. Thank you for your assistance