When the API responds that you’ve exceeded the maximum token length for a specific model do we get charge for number of tokens in the request prompt and the response tokens that did not get returned? Or do we get charged just for the request prompts? Or do we not get charged at all when the error occurs?
The request never makes it to an AI model, it is stopped by the tokenizer and endpoint. So it makes no sense to be billed more than any other input error you created. – otherwise I’d be on the hook for 12345678 max_tokens
ps: I’ve got a new billing game to play: minimum input, maximum output. Only costs $0.07 if you win.
gpt-3.5-turbo-16k-0613, 1 request
11 prompt + 4,813 completion = 4,824 tokens
Wow, I can’t even begin to imagine what prompt could give that result!
Thanks for your response @_j . How do you know though? Are there any official resources which give more information on this?
One can immediately make a bad call, and see that it not added to your every-five-minute display in daily usage.