Max tokens chat completion gpt4o

When sending a chat completion to GPT-4o with the following.

{
	"frequency_penalty" : 0,
	"max_tokens" : 32000,
	"messages" : 
	[
		{
			"content" : "What is the best thing to do with 2lbs of hamburger?",
			"role" : "user"
		}
	],
	"model" : "gpt-4o",
	"temperature" : 0.7,
	"top_p" : 1
}

and I get back

{
  "error": {
    "message": "max_tokens is too large: 32000. This model supports at most 4096 completion tokens, whereas you provided 32000.",
    "type": null,
    "param": "max_tokens",
    "code": null
  }
}

My understanding was that the max tokens was equal to context window, which is 128k for GPT-4o.
What am I missing?
From the chat completions api documentation

max_tokens

integer or null

Optional

The maximum number of tokens that can be generated in the chat completion.

The total length of input tokens and generated tokens is limited by the model’s context length. Example Python code for counting tokens.

The maximum number of tokens…

that can be generated

max_tokens sets the output length (which used to be unpredictable instead of artificially limited).

The total number of tokens for gpt-4o including prompt and completion tokens is 128,000. But the maximum number of completion tokens is 4,096. Unfortunate, as that seriously limits our use case.

This is also not viable for our usecase.
This should be at least 32000 tokens minimum.
Who would use the api if it only gives out 4096 tokens?

2 Likes

According to the docs, the august snapshot should support up to 16384 output token

gpt-4o-2024-08-06 Latest snapshot that supports Structured Outputs 128,000 tokens 16,384 tokens Up to Oct 2023