I don’t mean to be dense here, apologies.
Open AI’s documentation defines Token Limits as:
“Depending on the model used, requests can use up to 4097 tokens shared between prompt and completion. If your prompt is 4000 tokens, your completion can be 97 tokens at most.”
Here is how max_tokens is defined:
“The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length.”
Example Python code for counting tokens. The maximum number of tokens to generate in the chat completion.
“The total length of input tokens and generated tokens is limited by the model’s context length.” Example Python code for counting tokens.
-the maximum number of tokens required to complete the response -
In other words, to me that sounds like my prompt (input) plus the GPT’s response (output) totals is the token limit for the model. It’s also what Chat GPT is telling me in my conversations with it.
With that in mind, at least the literature seems to say that max tokens is what you put in with your prompt plus what the model puts out via its response.
I don’t doubt the procedure you’re mentioning is how it works. Sounds like you know a lot about it.
I’m just trying to overcome an error I keep getting over and over again, where my input is a fraction of the max_tokens, and my output added to it shouldn’t come anywhere near it.
There’s gotta be something I’m missing. I’ve been getting this error regardless of how I reset the context length, and whether I do an api call to gpt-4 or even gpt-3.5-16k. It just doesn’t add up.
Have you faced this in your api calls? Any advice about it?