Doubt on prompt tokens and completion tokens

After successfully hitting open ai chat completion api in response we get a usage object written below : -
Model is turbo 3.5 gpt
“usage”: {
“prompt_tokens”: 5807,
“completion_tokens”: 312,
“total_tokens”: 6119
},
My question is about tokens I understand completion tokens is the tokens generated by open ai whose max_length is 4096,
Prompt tokens are the token we are feeding the api as an input what is the max lenght of it this is my first question ?
Second question is about total_tokens which is the sum of prompt and completion tokens . Is there a max length for this key too ?

Limitations for prompt tokens are a function of models’ context window. Different models have different context windows, e.g. the gpt-4-turbo model series has a context window of 128,000 token while the regular gpt-4 model has a context window of 8,192 etc. You find the breakdown by model in the overview here.

The sum of prompt and completion tokens cannot exceed the context window. However, as you rightly understand, completion tokens are currently limited to 4,096 tokens. Hence, the maximum number of prompt tokens is the difference between the context window and the completion tokens.

You can also find official information about tokens here and here.

2 Likes

Also a bit curious is that if you send the API parameter max_tokens, the value you use doesn’t just limit the output, it also acts as a reservation of tokens from the context length which are only for the output (by API error).

using gpt-3.5-turbo

Example with max_tokens:20 parameter:

{‘completion_tokens’: 20, ‘prompt_tokens’: 15007, ‘total_tokens’: 15027}

Same with max_tokens:2000:

‘message’: "This model’s maximum context length is 16385 tokens. However, you requested 17007 tokens (15007 in the messages, 2000 in the completion)

But then if unspecified, giving you a cutoff at 4096:

{‘completion_tokens’: 32, ‘prompt_tokens’: 15007, ‘total_tokens’: 15039}

Slightly minfied Python example with 'reservation blocking'
import openai;c=openai.Client();r=c.chat.completions.create(
messages=[{"role":"user","content":"Do chimps laugh?" * 3000}],
model="gpt-3.5-turbo",max_tokens=2000)
print(r.choices[0].message.content,"\n",r.usage.model_dump())
1 Like